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♦ Chapter 1 ♦ 

WHAT IS STATISTICS? 


WE MUPPLE THROUGH LIFE MAKIN6 CHOICE* 
BA5EP ON INCOMPLCTG INFORMATION.. 


5H0ULP I WAVE THE 50UP? 
EVERYTHING EL5E 15 50 
CXFGNfIVG, ANP I PONT 
KNOW WHO'5 PAYING- ARE 
5TATI5TICIAN5 STINWT IVE 
NEVER GONE OUT WITH 
ONE BEFORE THOUGH I 
ONCE KNEW A VGRP 
GENEROU5 ACCOUNTANT... 


5HOULP I HAVE THE 50UP? 
27 OUT OF THE 36 TIME5 
IVE HAP IT. IT WA5 PRETTY 
GOOP... BUT 15 MONPAY THE 
REGULAR CHEF 5 NIGHT 
OFF? ANP WHAT IF ALL THE 
AIR MOLGCULGi IN THE 
ROOM 5UPPENLY FLY UP TO 
THE CEILING? 



1 


MOST OF US LIVE 
COMFORTABLY WITH SOME 
LEVEL OF UNCERTAINTY. 


rffln'LL n<we 
\ me sou?, 
\ please • 


AApkh. Coulpvdu 
JUST fcPiNfc. Me A- 
CALCULATOR ? 




mm 


WHAT MAKES STATISTICS UNIQUE IS ITS ABILITY TO QUANTIFY UNCERTAINTY. 
TO MAKE IT PRECISE. THIS ALLOWS STATISTICIANS TO MAKE CATEGORICAL 
STATEMENTf, WITH COMPLETE ASSURANCE—ABOUT TWElR LEVEL OF 
UNCERTAINTY/ 


SOOP CHOICE' I'M 95% > 

CONFlPENT THAT TONIGHT 'S 
SOUP HAS PROBABILITY 
BETWEEN 73% ANP 77% OF 
BEIN6 REALLY PELICIOUt! / 
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THIS If NOT JUST A MATTER OR 
ORPER/N6. SOUP' STATISTICS ALSO 
INVOLVED MATTERS OF LIFE ANP 
PEATH... 


v\eY- wave you ever wj 

HAP TH£ <5-OOP HERE ON Ei 
AW OFF Wl^WT? A?]' 


3 SS, 


®8J^vn 


FOR EXAMPLE. IN 190-5, THE SPACE SHUTTLE CHALLENGER EXPLORE?, KILLING 
SEVEN ASTRONAUTS. THE PENSION TO LAUNCH IN 29-PE6REE WEATHER HAP 
BEEN MAPE WITHOUT POIN6 A SIMPLE ANALYSIS OF PERFORMANCE PATA AT 
LOW TEMPERATURE. 

' 77 1 C ow..that\ 

• / \ PART Of THE 

^ */ \c<.$ve/// y 




A MORE POSITIVE EXAMPLE IS THE SALAT FOLIO VACCINE IN 1954, VACCINE 
TRIALS WERE PERFORMER ON SOME AOO.OOO CHILPREN, WITH STRICT CONTROLS 
TO ELIMINATE BIASEP RESULTS. COOP STATISTICAL ANALYSIS OF THE RESULTS 
FIRMLY ESTABLISHEP THE VACCINE'S EFFECTIVENESS, ANP TOPAY POLIO IS 
ALMOST UNKNOWN. 
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TO ACCOMPLISH THEIR FEATS OF MATHEMATICAL 
LE6ERPEMAIN, STATISTICIANS RELy ON THREE 
RELATE? PISCIPLINES' 


Data 

analysis 

THE CATHERINE, P/SPLAy, ANP 
NUMMARY OF PATA. 

Probability 

THE LAWS OF CHANCE, IN 
ANP OUT OF THE CASINO, 

Statistical 

inference 

THE SCIENCE OF PRAWIN6 
STATISTICAL CONCLUSIONS 
FROM SPECIFIC PATA USlNC- A 
KNOWLEP6-E OF PROBABILITY. 



IN THIS BOOK, WE LL LOOK AT ALL THREE, AS APPLIEP TO A WIPE VARIETY OF 
SITUATIONS WHERE STATISTICS PLAyS A CRUCIAL ROLE IN THE MOPERN WORLP 





IN CHAPTER 2. WE LL LOOK AT A 
SIMPLE PATA SET, THE REPORTEP 
WEIGHTS OF A BUNCH OF COLLEGE 
STUPENTS- 




ft® 

RsSki 


CHAPTERS 4 ANP 5 SHOW HOW TO 
PESCRIBE THE WORLP WITH 
PROBABILITY MO PEL*. US INC. THE 
CONCEPT OF THE RANPOM VARIABLE 


X 








IN CHAPTER 3, WE STUpy THE LAWS O 
PROBABILITY IN THEIR BIRTHPLACE. THE 
CAMBLlNC PEN. 


IM THE VI™ ‘7>> 
CENTURY 3% .v 




IN CHAPTER 7 ANP 
BEYONP. WE PESCRIBE 
HOW TO MAKE 
STATISTICAL INFERENCES 
IN SUCH COMMON REAL- 
WORLP ARENAS AS 
ELECTION POLLING, 
MANUFACTURING QUALITY 
CONTROL, METICAL 
TESTING, 

environmental 

MONITORING, RACIAL 
BIAf. ANP THE LAW. 


IN 
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FlNALLy. IN PISCU55IN6 
STATISTICS. IT 5 HARP TO 
AVOIP MENTIONIN6 ONE 
OTHER THIN6-: THE 
WIPESPREAP MISTRUST OF 
STATISTICS IN THE WORLP 
TOPAZ EVERyONE KNOWS 
ABOUT "LyiNfi- WITH 
STATISTICS,* WHILE &OOP 
STATISTICAL ANALySlS 15 
NEARLy IMPOSSIBLE TO FINP 
IN PAILy LIFE- WHAT'S ONE 
TO PO? 


i OUT OP 4 VoCloR S 
RecoMtffcHP WOT BELievYNfc 
ANY 4TATE*A6UT BEblNN.lN6 .. 
WTH “3 OOT OF 4 COCTOeS. .. 



OUR HUMBLE OPINION IS THAT LEARNING A UTTLC MORE ABOUT THE 
SUBJECT MI6HT NOT BE SUCH A BAP I PEA- ANP THAT'S WHy WE WROTE THIS 
BOOK/ 



/-- 

IN WHAT FOLLOWS. WE TRy TO PRESENT THE ELEMENTS OF STATISTICS AS 

6-RAPHlCALLy ANP INTUlTIVELy AS POSSIBLE- ALL yoU NEEP TO GET THROUGH 
IT IS A LITTLE PATIENGE, SOME THOUGHT, ANP A CERTAIN TOLERANCE FOR 
ALGEBRA-OH, IF NOT THAT. THEN MAyBE A COURSE REQUIREMENT!! 











♦ CHAPTER 

DATA DESCRIPTION 


UM... 
viELL... Vi's...' 
TwEy'pE — 

COO 6 A-' 



L 


0* . 
9 ~^>> 


A** 


\^<v 

Z«2 SV 4 % 

S >l2,„ '2 * 
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PATA ARE THE STATISTICIAN'S 
RAW MATERIAL, THE NUMBERS WC 
USE TO INTERPRET REALITY ALU 
STATISTICAL PROBLEMS INVOLVE 
EITHER THE tOUECTlON, 
PESCRIPTION, ANP ANALySlS OF 
PATA, OR THIHKIH6 ABOUT THE 
COLLECTION, PESCRIPTION, ANP 
ANALySlS OF PATA. 



THIS CHAPTER CONCENTRATE* ON PATA PEftRIFTIOH. HOW CAN WE REPRESENT 
PATA IN USEFUL WAyS? HOW CAN WE SEE UNPERLyiN6 PATTERNS IN A HEAP OF 
NAKEP NUMBERS? HOW CAN WE SUMMARIZE THE PATA'S BASIC SHAPE? 



WELL, TO PESCRIBE PATA, THE FIRST THIN6 yoU NEEP IS SOME ACTUAL PATA 
TO PESCRIBE... SO LET'S COLLECT SOME PATA! 
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HERE IS SOME REAL PATA; 

AS PART OF A CLASSROOM 
EXPERIMENT, 92 PENN STATE 
STUPENTS REPORTEP THE/R 
WEIGHT, WITH THESE 
RESULTS: 


MALES 

140 MS ISP 190 155 IAS 150 190 195 130 160 155 153 145 170 175 175 170 100 135 

170 157 130 105 190 155 170 155 215 150 M5 155 155 150 155 150 100 160 135 160 

130 155 150 140 155 150 140 100 190 145 150 164 140 142 136 123 155 

FEMALES 

140 120 130 130 121 125 116 145 150 112 125 130 120 130 131 120 110 125 135 125 
110 122 115 102 115 150 110 116 100 95 125 133 110 150 100 


6ETTIN6 RI6HT POWN TO BUSINESS, WE PRAW A POT PLOT: ONE POT PER 
STUPENT SOES OVER EACH STUPENT’S REPORTEP WEI6HT. 



100 150 200 


Weight in Pounds 


you MAy SEE A PROBLEM HERE- 
THE CLUMPS AT 150 ANP 155 
POUNPS. THE STUPENTS TENPEP 
TO REPORT THEIR WEI6HT IN 
RVE-POUNP INCREMENTS IN 
REAL-LIFE SITUATIONS LIKE THIS 
ONE, SUCH ROUNPIN6 OFF CAN 
OBSCURE GENERAL PATTERNS IN 
PATA... BUT FOR NOW, WE'LL JUST 
WORK AROUNP IT. 
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' WE CAN SUMMARIZE THE PATA WITH A FREQUENCY TABLE. DIVIPE THE MUMPER 
LIME INTO INTERVALS ANP COUNT THE NUMBER OF STUPENT WEIGHTS WITH/N 
EACH INTERVAL. THE FREQUENCY IS THE COUNT IN ANy 6IVEN INTERVAL THE 
RELATIVE FREQUENCY IS THE PROPORTION OF WEIGHTS IN EACH INTERVAL. 

I E-, IT S THE FREQUENCy PIVIPEP By THE TOTAL NUMBER OF STUPENTS. 


CLASS INTERVAL MIPPOINT FREQUENT RELATIVE FREQUENT 


075-102.4 

95 

2 

022 

102.5-117.5 

no 

9 

090 

117.5-152.4 

125 

19 

206 

152.5-147.4 

140 

17 

■105 

147.5-162.4 

155 

27 

■293 

162.5-1774 

170 

0 

007 

1775-192.4 

195 

0 

.007 

192 5-207.5 

200 

1 

■Oil 

2075-2224 

215 

1 

on 

TOTAL 


92 

1.000 


NOTE: WE KEPT THE INTERVAL BOUNPARIES AWAy FROM THOSE TROUBLESOME 
^ 5-POUNP MULTIPLES. THIS 6ETS AROUNP THE STUPENTS' REPORTING BIAS 


6UIPELINES FOR FORMIN6 THE CLASS INTERVALS: 


m | USE INTERVALS OF 
■ I EQUAL LENGTH WITH 
MIPPOINTS AT 
CONVENIENT ROUNP 
NUMBERS. 

Ol For a small pata 

9 SET. USE A SMALL 

number of 

INTERVALS. 

3| FOR A LAR6E PATA 
1 SET, USE MORE 
INTERVALS.' 



to 



IN THE FREQUENCY TABLE, WE ARE 5HOWIN6 HOW MANY PATA POINTS ARE 
‘AROUNP* EACH VALUE. WE CAN 6RAPH THIS INFORMATION, TOO. THE RESULTING 
BAR 6RAPH IS CALLEP A WTO&RAM. EACH BAR COVERS AN INTERVAL ANP IS 
CENTEREP AT THE MlPPOINT. THE BAR’S HEIGHT IS THE NUMBER OF PATA 
POINTS IN THE INTERVAL- 



Weight in Pounds 


WE CAN ALSO PRAW A RELATIVE FREQUENCY MtTOZRAM, PLOTTING THE 
RELATIVE FREQUENCY A6AINST THE WEI6HT. IT LOOKS EXACTLY THE SAME, 
EXCEPT FOR THE VERTICAL SCALE- 



100 150 200 

Weight in Pounds 
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THE STATISTICIAN JOUH TUKCY 

iNvgNTgp a quick: wAy to 

SUMMARY PATA AMP STILL Ifggp 
TMg INPIVIPUAL PATA POINTS IT'S 
CALLgp Twg S TEM-AHP'LGAF 
PIA6RAM 



FOR THE WEIGHT PATA. THE STEM IS A 
COLUMN OF NUMBER'S. CONSISTING OF 
THE WEIGHT PATA COUNTER By TENS 
Cl E . WE LEAVE OFF THE LAST PIGlT/ 


9 
IP 
It 
12 

13 

14 

15 
IS 
17 

10 
19 
2P 
21 


90 POUNPS 
POUNPS, 

IT 





FILLER IN, IT LOOKS LIKE THIS: 

9 « 5 
IP : 200 

11 : S200SSPSP 

12 : P15S3PP5S2S 

13 : 05PP0SPSPP1S3 

14 : P5SP5S0P5P2 

15 SPS37PSSP5SPSPSPSPPSPP 
IS : P5PPP4 

17 •- PSSPPP 

10 P5PP 
19 PPSPP 
2 P: 

21 : s 



ANP FINALLY, PUT THE ’LEAVES' IN 
ORPER. 


9: 
IP : 
11 : 
12 : 

13 : 

14 ! 

15 : 
IS : 
17 : 
10 : 
19 : 
2P 
21 : 


S 

200 

PP2SSSS00 

PPP123SSS55 

PPPPP13555S00 

PPPP25S5550 

PPPPPPPPPP35SSS55SS557 
PPPP4S 
PPPDS5 
PPPS 
PPPP5 



ALL THOSE ZEROES ANP FIVES CLEARLY 
SHOW THE STUPENTS' REPORTING BIASI 
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HER STATISTICAL EFFORTS LEP 
PIRECTLY TO IMPROVER HOSPITAL 
CONPITIONS ANP A REPUCTION IN THE 
PEATH RATE- 


CRUSAPING NURSE FLOR£N££ NI6HTIH6AL£ 
COMPILER MORTALITY *TATI*TI£* FROM 
BRITISH MILITARY HOSPITALS, PROPUCING *>«“ 
SHOCKING HISTOGRAMS LUCE THIS ON& S 
THE RAPlAL AXIS 
INPICATES PEATHS—IN 
HOSPITALS AS WELL AS 
ON THE BATTLEFIELP- 
CF BRITISH SOLPlERS 
IN THE CRIMEAN WAR. 


saved By 
.Statistics' 




SUMMARY STATISTICS 

NOW WE MOVE FROM PICTURES TO FORMULAE. OUR OBJECT IS TO UET SOME 
SIMPLE MEASUREMENT* OF THE ^RUPEST CHARACTERISTICS OF A SET OF PATA... 



ANy SET OF MEASUREMENTS 
MAS TWO IMPORTANT 
PROPERTIES- TME CENTRAL 
OR TYPICAL VALUE, ANP 
TME SPREAP ABOUT THAT 
VALUE- YOU CAN SEE TME 
IPEA IN TMESE 
HyPOTM£Tl£AL MIST 06 RAMS 


WIDE .-CWTtR Wg^RHtRe 

S?REaD <* 




H 






A SMALL SET OF /7 PATA POINTS MAKES THE BOOKKEEPING EAST 
SUPPOSE. FOR EXAMPLE, WE ASK FIVE PEOPLE MOW MANy MOURS OF 
TELEVISION TMEY WATCH IN A WEEK.. ANP GET THE FOLLOWING ARRAV 


OBSERVATION 12 3 4 5 

PATA VALUE 5 7 3 30 7 


THEN * 5, - 7, = 3, X 4 = 30, ANP = 7- 


WHAT'S THE CENTER" OF 
THESE PATA? THERE ARE 
ACTUALLy SEVERAL 
PIFFERENT WAVS TO 
MEASURE IT. WE LL LOOK AT 
JUST TWO OF THEM. 





THE ■ WM ^ COR ‘AVERAGES 

THEjM£4A/ OR AVERAGE VALUE IS representep 
By x.~ IT S OBTAINEP By APPING ALL THE PATA ANP 
PIVIPING By THE NUMBER OF OBSERVATIONS 
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IN THE £A5E OF OUR 92 PENN 5TATE 5TUPENT5, THE MEAN WEIGHT (5 


IE 


145.15 pounds 







TO FlNP THE MEPIAN 
VALUE OF A PATA SET, 
W£ ARRANGE THE PATA 
IN ORPER FROM 
SMALLEST TO LAR6EST 
THE MEPIAN IS THE 
VALUE IN THE MIPPLE 


S 7 7 


THE MEPIAN 


IF THE NUMBER OF POINTS IS EVEA/-IN WHICH CASE THERE IS NO MIPPLE, WE 
AVERAGE THE TWO VALUES AROUNP THE MIPPLE... SO IF THE PATA ARE 


5 5 7 


7> 

MiPPLE 

SPACE 


WE AVERAGE S 
ANP 7 TO 6ET 


g + 7 
2 




THIS 6.IVES US A GENERAL RULE ORPER THE PATA FROM SMALLEST TO LAR6EST 


IF THE NUMBER OF PATA 
POINTS IS OPP, THE MEPIAN 
IS THE MIPPLE PATA POINT 

IF THE NUMBER OF POINTS IS 
CVCN. THE MEPIAN IS THE 
AVERAGE OF THE TWO PATA 
POINTS NEAREST THE MIPPLE- 


r 3v)!T AS THE MEDIAN 
STEP'S POSITION is 

there, but not 
sThvE Strip - 





IT 







FOR THE n-9 2 STUDENT WEIGHTS, 
WE CAN FIND THE MEDIAN FROM THE 
ORPEREP STEM-ANP-LEAF DIAORAM 
JUST COUNT TO THE 46™ 
OBSERVATION. THE MEPIAN IS 


X 4A + Xa 


145 + 145 


MS 


POUNDS 


5 

206 

002556600 

00012355555 

0000013555600 

OOOO 2959550 

00000000003555555555*7 

000045 

000055 

0005 

00005 


WHy MORE THAN ONE MEASURE OF THE CENTER? EACH HAS ADVANTAGES- FOR 
EXAMPLE, THE MEPIAN IS NOT SENSITIVE TO OUTLIER 5, OR EXTREME VALUES 
NOT TYPICAL OF THE REST OF THE DATA SUPPOSE IN OUR SMALL TV- 
WATCHIN6- 6-ROUP. ONE PERSON WATCHES 200 HOURS PER WEEK. THEN OUR 
PATA ARE 3, 5, 7. 7, 200. THE MEPIAN, 7. IS UNCHANGED, BUT THE MEAN IS 
NOW X - 49.0! 



W You're 
WSTOPW 
THfc MeW 
ne\<?nT 
, Too', y 


IN 1904 THE UNIVERSITY OF VIRGINIA ANNOUNCER 
THAT ITS DEPARTMENT OF RHETORIC ANP COM- 
MUNlCATlONS GRADUATES' MEAN STARTING SALARY 
WAS $99000. THE OUTLIER, THE SALARY OF NBA 
CENTER RALPH SAMPSON, pip NOT REPRESENT THE 
EARNlNO POWER OF A BA IN SPEECH FROM U. OF V. 
(THE MEPIAN SALARY WASN'T PUBLISHED; 


ft 
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MEASURES OF 


BESIPES KNOWIN6 THE 
CENTRAL POINT OF A PATA 
SET, WE'P ALSO LIKE TO 
PESCRIBE THE PATA'S 
4PREAP, OR HOW FAR 
FROM THE CENTER THE 
PATA TEMP TO RAN6-E. 

FOR INSTANCE, IF THE 
STUPENTS ALL WEI6HEP 
EXACTLY 145 POUNPS, 
THERE WOULP BE NO 
SPREAP AT ALL. 
NUMERICALLY THE SPREAP 
WOULP BE ZERO, ANP THE 
HISTOGRAM WOULP BE 
SKINNY 






BUT IF MANy OF THE STUPENTS WERE VERY LI6HT ANP/OR VERy HEAVy, 
OBVIOUSLy WE'P SEE SOME SPREAP-SAY IF THE FOOTBALL TEAM WAS PART 
OF THE SAMPLE- 



THE HISTOGRAM WOULP BE WIPER, SOMETHING LIKE THIS-. 


^ A 6 AIN, THERE’5 MORE THAW OWE WAy TO MEASURE A 5PREAP. OWE WAy 15 

INTERQUARTILE RANGE 

THE IPEA 15 TO PIVIPE 
THE PATA IWTO FOUR 
EQUAL £ROUP5 AWP 5EE 
HOW FAR APART THE 
EXTREME &ROUP5 ARE 


V_ J 


C=1 C=3 1=3 =3 
• • 



HERE’5 THE RECIPE 


| | PUT THE PATA IW WUMERI^AL 
* ORPER 



PIVIPE THE PATA INTO TWO 
EQUAL HI 6 -H AWP LOW GROUPS 
AT THE MEPIAW (IP THE 
MEPIAW 15 A PATA POIWT. 
IW^LUPE IT IW BOTH THE Hl 6 H 
AWP LOW 6ROUP5.; 


3 1 FlWP THE MEPIAW OF THE 
° " LOW 6 ROUP. THIS 15 CALLEP 
THE FIR5T QUARTILE, OR Q,. 



THE MEPIAW OF THE HI&H 
6 ROUP 15 THE THIRP 
QUARTILE, OR Q,. 



lz Wfe*VS 


Of 



3 


O * MfPfAW Of 

* ^1 LOW5 

• • • 


WOW THE IWTERQUARTILE RAW 6 -E (IQR) 15 THE PI5TAWCE (’OR PIFFEREW^EJ 
BETWEEW THEM: 


IQR =• Q ? — Qj 
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HERE'* THE WEIGHT PATA 
WITH THE MlPPOINT* OF 
THE HI6H AMP LOW 6ROUP* 
EMPHA*IZ£P: 

9« 9 
I o 200 

It 002**6690 *r 

12 000123** 99* 

13 ■■ 0000013*9*699 

14 ■ 000029****9 

If : 00000000003*********97 

16 ■■ 00004* 'T' 

17 : OOOO** ' 

19 • OOO* 

19 ■■ OOOO* 

20 : 

21 : * 


JOHN TUKEY INVENTEP ANOTHER KINP OF 
PI*PLAY TO *HOW OFF THE IC?R, £ALLEP A 
BOX AMP WHISKER* PLOT. THE BOX'* 
ENP* ARE THE OUARTILE* <?i ANP <?,. WE 
PRAW THE MEPlAN IN*IPE THE BOX. 



—i-1- 1 -1-r--i---r- 

IXO 140 1*0 \V, 

IF A POINT I* /MORE THAN If I<?R FROM 
AN ENP OF THE BOX. IT * AN OUTLIER. 
PRAW THE OUTLIER* INPIVIPUALLY. 


ANP WE *EE THAT 

U?R = 1*6 - 125 

- 31 POUMP5 



—I-—I 

13* 


-- 

|5S • • • 


~I 

Zoo 


A6-AIN, THI* I* THE PIFFEREN^E 
BETWEEN THE MEPIAN HEAVY 
*TUPENT ANP MEPIAN LI6HT ONE 



FINALLY, EXTENP *WHI*XER*- OUT TO THE 
FARTHE*T POINT* THAT ARE NOT OUTLIER* 
(I E, WITHIN I f IQR OF THE QUARTlLE*;. 



1 —i—I—I—!-1 I i—i-1—I I r 



BOX-ANP- 
WHI*KER* 
PLOT* ARE 
E*PEOALL y 
GOOP FOR 
*HOWIN6 OFF 
pifference* 
BETWEEN 
6R0UP*. 




THE STANPARP MEASURE OF SPREAP IS THE 

STANDARD DEVIATION 

UNLIKE THE I6?R, WHI£H IS 
BASEP ON MEPIANS, THE 
STANPARP PEVlATION MEASURES 
THE SPREAP FROM THE MEAH 
you cm THINK OF IT, 

ROU6HLy SPEAKIN6, AS THE 
AVERAGE PISTAN6E OF THE 
PATA FROM THE MEAN X- 

EXCEPT THAT WE USE THE SQUARE* OF THE_PlSTAN£ES INSTEAP THAT IS, 
IF THE SOUAREP PlSTANCE OF POINT X t TO X IS (X, - X) 2 , THEN 

n 

AVERAGE SQUAREP PlSTAN^E = \(Xj -x) 1 

i-1 

FOR TE£HNl£AL REASONS, WE USE /7-1 IN 
THE PENOMlNATOR RATHER THAN n, ANP 
PEFINE THE SAMPLE VARIANCE S 2 AS 


* nh 

i = / 

<___ J 




FOR THE PATA SET {3 5 7 7 30}, WITH X - 12 ANP /? = 5 WE CALCULATE 
THE VARIANCE; 





BUT A 5PREAP MEASURE 5HOULP 
HAVE THE 5AME UMT6 A5 THE 
ORIGINAL PATA- IM THE 
EXAMPLE OF WEI6HT5, THE 
VARIANCE 5 2 15 MEA5UREP IM 
POUMP5 6QUARGP . OOOP5? 




THE OBVIOU5 THIM6- TO PO 15 TO 
TAKE THE 5<?UARE ROOT, AMP 50 WE 
PO- • TO PEFIME; 


5TANPARP - * > r<r - VSix-- 
PkVIATION r:^z; r““‘ 



EVEM FOR 5MALL PATA 5ET5, 

the arithmetic CAM BE 

TEPIOU5! 50 NOWAPAy5, WE 
JU5T HIT THE 5 BUTTOM OM 
THE HAMP CALCULATOR, OR 
COM5ULT THE PATA REPORT 
6EMERATEP By A COMPUTER 
50FTWARE PACKAGE 





Properties of 

X and 

THE MEAN ANP STAN PARC? 
PEVIATION ARE VERY GOOP 
FOR SUMMARIZING THE 
PROPERTIES OF FAlRLy 
SYMMETRICAL HISTOGRAM* 
WITHOUT OUTL!ER*~\E., 
HISTOGRAMS SHAPEP LIKE 
MOUNPS. 

V_ 




IT'S OFTEN USEFUL TO KNOW HOW MANy STANPARP PEVlATlONS A PATA POINT 
IS FROM THE MEAN. WE PEFlNE z SCORES, OR STANPARPIZEP SCORES, AS 
PISTANCE FROM X FER STANPARP PEVIATION. 



A Z ^CORG. OF +2 MEANS THAT AN OBSERVATION IS TWO STANPARP 
PEVIATION* ABOVE THE MEAN. FOR THE WEIGHT PATA (X = 1 45-1ANP 
S-23.7;, WE 6AN PLOT THE PATA ON THE ORIGINAL 2-AXlS IN POUNPS ANP 
THE Z-XORQ AXIS SlMULTANEOUSLV 




100 


-2 


T 


-1 


150 200 


0 1 

Z score 

1.26 



A STUPENT WEIGHING 175 POUNPS HAS A Z-S60RE OF 


23 7 '*-AO 


at 





Cute L\'l 
OOTLIEfc ? 


an EMPIRICAL RULE: 

F OR NEARLY SYMMETRIC MOUNP-SHAPEP DATA SETS, APPROXIMATELY 

OF THE DATA 15 WIT WIKI OHG STANDARD DEVIATION OF TWE MEAN ANP OF 

THE DATA IS WITHIN TWO STANDARD DEVIATIONS OF THE MEAN 


FOR THE WEIGHTS, OUR EMPIRICAL RULE HOLDS UP PRETTY WELL 64% 
(=59/92) OF THE WEIGHTS ARE WITHIN ONE STANDARD DEVIATION OF THE 
MEAN, AND 97% (= 09/92) OF THE WEIGHTS ARE WITHIN TWO STANDARD 
DEVIATIONS OF THE MEAN. 


Weight in pounds 


59 points 


89 points 


92 points 


Z score 


AND NOW 
FOR A REST 
FROM NUMBER 
CRUNCHING/ 








WEVC COME A LON£ WAy IN THIS CHAPTER' STARTING WITH A UNOR6ANIZEP 
PILE OF NUMBERS. WE HAVE 


w | FOUNP SEVERAL DIFFERENT 
1 I WAVS TO PlSPLAy THEM 

LOOKEP AT TWO PIFFERENT 
21 CONCEPTS OF THE CENTER OF 
PATA. THE MEPIAN ANP THE 
MEAN 

| MEASUREP THE SPREAP OF THE 
° " PATA AROUNP THE CENTER IN 
TWO PIFFERENT WAyS 

ENCOUNTEREP MOUNP-SHAPEP 
4 ) HISTOGRAMS ANP Z, A VARIABLE 
THAT INPICATES HOW MANy 
STANPARP PEVIATIONS yOU ARE 
FROM THE MEAN 


WOW' WE 
PIP ALL 
THAT? 




NOW, IN ORPER TO PROBE THE BEHAVIOR OF PATA MORE PEEPLy, WE RE COINS- 
TO MAKE A LITTLE PETOUR INTO THE REALM OF RAHPOMNC65.. A LANP WHERE 
THINGS ALWAYS WORK OUT IN THE LON£ RUN, ANP WHERE THE ONLy LAW IS 
THE LAW OF THE 6AM8UH6 CA4IHO-. 




Zb 





♦ Chapter 34 

PROBABILITY 



OTHING IN LIFE 15 CERTAIN. IN EVERYTHING WE PO, WE 
GAUGE THE CHANCE* OF *UCCE**FUL OUTCOME*. FROM 
BU*INE** TO MEPICINE TO THE WEATHER. BUT FOR MO*T 


OF HUMAN HI*TORy, PROBABILITY. THE FORMAL *TUPY OF THE 
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THE CHEVALIER REASONEP 
THAT THE AVERAGE NUMBER 
OF SUCCESSFUL ROLLS WAS 
THE SAME FOR BOTH GAMBLES- 

CHANCE OF ONE ^ 

£VE&A£F NUMBER iH _ 
FOUR ROLLS -. ^ (^=5 

CHAM^k OF ROUBLE * 

SIK IN ONE ROLL - 

AVERAGE NUMBER IN ~ 

M ROLLS-- 24 = 3 

WHY, THEN, PIP HE LOSE 
MORE OFTEN WITH THE 
SECONP 6-AMBLE??? 



PE MERE PUT THE QUESTION TO HIS FRIENP, THE 6-ENlUS BLAISE PASCAL 
0623-MU). 

ALTHOUGH PASCAL HAP EARLIER 
6-IVEN UP MATHEMATICS AS A FORM 
OF SEXUAL INPUL6-ENCE (0, HE 
A6-REEP TO TACKLE PE MERE'S 
PROBLEM. 


PASCAL WROTE HIS 
FELLOW OENIUS PIERRE 
PE FERMAT, ANP WITHIN 
A FEW LETTERS, THE 
TWO HAP WORKEP OUT 
THE THEORY OF 
PROBABILITY IN ITS 
MOPERN FORM—EXCEPT, 
OF COURSE. FOR THE 
CARTOONS 














THE SAMPLE SPA£E OF THE THROW OF A 6IN6LC PIC IS A LITTLE &I66ER 



ANP FOR A PAIR OF PICE. THE SAMPLE SPACE LOOKS LIKE THIS (WE MAKE ONE 
PIE WHITE ANP ONE BLACK TO TELL THEM APART* 






• • 


• • 






• • 
• • 


THIS SAMPLE SPACE 
HAS 36 (6X6) 
ELEMENTARy OUT - 
COMES. FOR THREE 
PICE, THE SPACE 
WOULP HAVE 216 
ENTRIES, AS IN THIS 
6X6X6 STACK. ANP 
four vice? 
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AT SOME POINT. WE HAVE TO STOP 
LISTING, ANP START THINKING... 







NOW LCT* IMAGINE A 
RAN POM EXPERIMENT WITH 
n ELEMENTARY OUTCOME* 
0„ O z , - 0„. WE WANT TO 
A**I 6 N A NUMERICAL 
WEtCUT, OR PROBABILITY, 
TO EACH OUTCOME, WHICH 
MEA*URE* THE LIKELIHOOP 
OF IT* OCCURRING WE 
WRITE THE PROBABILITY OF 

o, a* pco,;. 


FOR EXAMPLE, IN A FAIR COIN 
TO**, HEAP* ANP TAIL* ARE 
EQUALLY LIKELY, ANP WE 
A**I 6 -N THEM BOTH THE 
PROBABILITY 5 

poo » pm *.? 

EACH OUTCOME COME* 

UP HALF THE TIME- 
A*K ANY FOOTBALL 
PLAYER! 




'v 


IN THE ROLL OF TWO P/CJE, THERE ARE ELEMENTARY OUTCOME*, ALL 


EQUALLY LIKELY, *0 THE PROBABILITY OF EACH I* “ 


FOR IN*TANCE, 

?(VLUK 9, WHITE 2) 


1 
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WHICH MEAN*: IF YOU ROLLEP THE 
PICE A VERY LAR 6 E NUMBER OF TIME*, 


IN THE LON6 RUN THI* OUTCOME 


WOULP OCCUR r- OF THE TIME 


OiJE BiLLIOM, 'S. HOHbfteO 
MICUON ... BACK VlHEtZE- 
AMP 







WHAT IF OUR RAMBLER 
CHCAT* AMP THROWS A 
LOAPEP Pier FOR THE SAKE 
OF ARGUMENT, SUPPOSE THAT 
NOW A ONE £OMES UP 29% 
OF THE TIME ON THE LON& 
RUN). 



THE SAMPLE SPA££ IS THE 
SAME AS FOR A FAIR PIG 

{ 1 . 2 . 3 , 4 , *, *} 

BUT THE PROBABILITIES ARE 
PIFFERENT NOW ?(1) *.25 
ANP THE REMAINING 
PROBABILTIES APP UP TO 75 
IF 2, 5, 4. 5, ANP * WERE 
ALL EQUALLY LIKELY. THEN 
EA£H ONE WOULP HAVE 

PROBABILITY .15 ^ i(.7^) 


0(3 030 3 


.15 


.16 


.15 


.15 


• 16 


.»5 



IN GENERAL, ELEMENTARY OUTCOMES NEEP NOT HAVE EQUAL PROBABILITY 
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MOW WHAT CAN WE SAY 
ABOUT THE PROBABILITIES 
P(Oj) l M AM ARBITRARY RAN- 
POM EXPERIMEMT? FIRST OF 
ALL. 

pfo,;? O 

PROBABILITIES ARE A IEVER 
HE6ATIVE- A PROBABILITY OF 
ZERO MEAMS AM EVEMT CAN'T 
HAPPEM. LESS THAM ZERO 
WOULP BE MEANINGLESS. 




WoRS£ THAM 

IM?o64IBLE 

I^M‘T 


SECONP. IF AM EVEMT IS CERTAIN TO HAPPEM, WE ASSIGN IT PROBABILITY t. 
CIM THE LONG RUM, THAT'S THE PROPORTION OF TIMES IT WILL OCCUR',) 

_— IM PARTICULAR, ___ 

THE TOTAL 

POk probability of J oo - v 

^ THE SAMPLE (MBT#Hf6KkLl' 

SPACE MUST BE 1. IF WE PO I 

THE EXPERIMENT, SOMETHIN Jiff* J 

IS BOUNP TO HAPPEN.' 7 / • IS 


PUT THESE TWO TOGETHER, AMP YOU HAVE THE CHARACTERISTIC 
PROPERTIED OF PROBABILITY ; 


PCO,)^ O 

P(0,) + P(0 2 ) + .„ + P(0„) = 1 


PROBABILITY IS NON-NEGATIVE 

TOTAL PROBABILITY OF ALL 
ELEMENTARY OUTCOMES IS ONE 


vf X 

M&TAPHy-SlcS 

vj/ll 

-My ^>Ulf2T--• . 







LIKE A CLEVER POLITICIAN. WE 
HAVE AVOIPEP CERTAIN 
UNPLEASANT QUESTIONS, 
SUCH AS a; WHAT POES 
PROBABILITY MEAN? ANP 
B; /VOW PO WE AS5I6N 
PROBABILITIES TO OUTCOMES? 


/ B-PUH. B-PUH...' 
/ LET'S PISCUSS 
SOMETHIN6 EASIER. 

like says in the 

\ MILITARY-. 


HERE ARE SOME APPROACHES THAT HAVE BEEN TAKEN- 




bET? 


Classical PROBABILITY: 

BASEP ON 6AMBLINO IPEAS, THE 
FUNPAMENTAL ASSUMPTION IS THAT 
THE OAME IS FAIR ANP ALL 
ELEMENTARY OUTCOMES HAVE THE 
SAME PROBABILITY. 


/ C.MON/ > 
[ PAPPY HEEPS 
I A HEW i 
\THE OR y// 


Relative Frequency: 

WHEN AN EXPERIMENT CAN BE REPEATEP, 
THEN AN EVENT S PROBABILITY IS THE 
PROPORTION OF TIMES THE EVENT 
OCCURS IN THE LONO RUN 


Personal PROBABILITY MOST 

OF LIFE’S EVENTS ARE HOT 
REPEATABLE PERSONAL PROBABILITY 
IS AN INPIVIPUAL'S PERSONAL 
ASSESSMENT OF AN OUTCOME S 
LIKELIHOOP. IF A GAMBLER BELIEVES 
THAT A HORSE HAS MORE THAN A SP<*> 
CHANCE OF WINNING, HE’LL TAKE AN 
EVEN BET ON THAT HORSE 


AN OBJECTIVIST USES EITHER THE 
CLASSICAL OR FREQUENCY PEFINITION 
OF PROBABILITY A SUBJECTIVIST OR 
BAYESIAN APPLIES FORMAL LAWS OF 
CHANCE TO HIS OWN. OR YOUR. 
PERSONAL PROBABILITIES- 


HOW D O You £Hoyl? 

V/YpPoM OF 

, VpAmck ;—” 




HOW PO YOU KNOW THE' 
ELEMENTARY OUTCOMES 
ARE EQUALLY LIKELY 
WITHOUT ROLLING THE I 
PICE A BILLION TIMES? J 


} 5 " 










BASIC OPERATIONS 

SO FAR, WE HAVE PlSCUSSEP ONLY THE 
PROBABILITY OF ELEMENTARY OUTCOMES. 
IN THEORY, THAT WOULP BE ENOUGH TO 
PESCRIBE ANY RANPOM EXPERIMENT, BUT 
IN PRACTICE IT’S PRETTY UNWIELPY. FOR 
EXAMPLE, EVEN SUCH AN ORPINARY 
OCCURRENCE AS ROLLING A SEVEN IS NOT 
AN ELEMENTARY OUTCOME... SO WE 
INTROPUCE A A/EW /P£4x 



AN EVENT IS A SET OF ELEMENTARY OUTCOMES THE PROBABILITY OF AN 
EVENT IS THE SUM OF THE PROBABILITIES OF THE ELEMENTARY OUTCOMES IN 
THE SET. FOR INSTANCE, SOME EVENTS IN THE LIFE OF A TWO-PICEP ROLLER 
ARE 

EVENT PESCRIPTION 

EVENTS ELEMENTARY 

OUTCOMES 

PROBABILITY 

A PICE APP TO 3 

{0.2), (2.1)} 

ww* h 

B PICE APP TO b 

{Of). (2.4). (33). (4.2). (9.1)) 


C WHITE PIE SHOWS 1 

{0,i;. 0.2). 0.3). 0.4). 

0.9). 0.6)} 


P: BLACK PIE SHOWS 1 

{O.i;. (2.1). (3.1). (4.1). 

(9.1). (b.D) 

?m- £ 



y 





THE BEAUT/ OF USING 
EVENT*, RATHER THAN 
ELEMENTARY OUTCOMES. I* 
THAT WE CAN COMBINE 
EVENT* TO MAKE OTHER 
EVENT*. USING LOGICAL 
OPERATIONS- THE 
RELEVANT WORPS ARE 
ANP, OR, ANP NOT 



f JO*T ■? N 
LITTLE 
WORD*, 

Chevalier 




THAT I*, GIVEN EVENT* E ANP F, WE CAN MAKE NEW EVENT*; 


e and F 


THE EVENT E ANP THE EVENT F BOTH OCCUR. 


E OF F : THE EVENT E OR THE EVENT F OCCUR* COR BOTH PO) 

Hot E : THE EVENT E POE* NOT OCCUR 


COMBINING OUR PRIMITIVE 
PEFINlTlON* OF PROBABILITY WITH 
THE*E LOGICAL OPERATION* WILL 
GIVE U* *OME POWERFUL 
FORMULA* FOR MANIPULATING 
PROBABILITIES 


f \ GAMBLE COMPULSIVELY 

AMD I LOST MV SMIRT 
AND M PASCAL IS STitL 
WORKING ON My PROBLEM V 
VIHAT APE My CHANCE* 

AV« TO, CkER/P ■» ^ 

Vy. ._ slim 

/ ( OR 


■\U M c,« 





WORSE 




'//*/ 
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LET'5 RETURN TO THE PICE-THROWIN6- EXAMPLE. IF C 15 THE EVENT, WHITE 
PIE * 1. ANP P 15 THE EVENT, BLACK PIE = 1. THEN. 


mm sun nun m 

ixffii:-:B8 mm El 
ES \::m Effi E 
0ffl 0E 0* 0 

□a an □* □ 


£ OR P 15 THE 
ENTIRE 5HAPEP 
AREA (WHERE 
ONE PIE OR THE 
OTHER 15 a 

C ANP P 15 
WHERE THE 
5HAPEP AREA5 
OVERLAP (BOTH 
PICE ARE a 


THI5 ILLU5TRATE5 THE APPITIOH RULE-. FOR ANy EVENT5 E, F, 

P(E OR F) = P(E) + P(F) - P(E AND F) 

appin6 p(e; + p(f; pouble couurf the elementarv outcom£5 5harep By 

E ANP F, 50 WE HAVE TO 5UBTRACT THE EXTRA AMOUNT, WHICH 15 PCE ANp F). 
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r SOMETIMES, THE OVERLAP E AMP F 15 EMPTY, A KIP THE TWO EVENT* HAVE 
NO ELEMENTARY OUTCOME* IN COMMON. IN THAT CASE, WE SAY E ANP F ARE 
MUTUALLY EXCLU&VE, MAKIN& PCE ANP FD - O- HERE WE SEE THE MUTUALLY 
EXCLUSIVE EVENT* A. THE PICE APP TO 3, ANP B. THE PICE APP TO 6 


TliS 

Sffi XBS £•:!» 
L‘ffl Hffl Idffi 
□ffl EH B* 


□ffi CH □* □ 

HH 138 EBH 




□IS HI 

Emm 


FOR MUTUALLY EXCLUSIVE EVENT*. WE 6-ET A SPECIAL APPITtON RULE IF E 
ANP F ARE MUTUALLY EXCLUSIVE. THEN 

P(E OR F) = P(E) + P(F) 

ANP WE CHECK THAT PCA OR B) = ~ = -V — » PCaV P(B) 


ANP FINALLY. A *UBTRA£TlON RULE: FOR ANY EVENT E. 


P(E) = 1 - P(NOT E) 


THIS I* USEFUL WHEN PfNOT E> I* EASIER TO COMPUTE THAN PCE). FOR 
INSTANCE. LET E BE THE EVENT, A POUBLE-1 IS HOT THROWN. THE EVENT 
NOT E, A POUBLE-1 /* THROWN. HAS PROBABILITY PCNOT E; * ~ . 


pce; = i -pc mot e; 

-'-h 
, a 
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HU HO H □ HO HE HO 
HO HO HO HO HO HO 
HO HO HO HO HO HO 
HICI HO HO HO HO HO 
HO HO HO HO HO HO 
Him Him his nm nm Him 





COLP.. 


THE FORMULAS WE JUST PERIVEP 
ARE, IN FACT, ADEQUATE FOR 
ANSWERING PE MERE'S QUESTION- 
BUT NOT EASILY! (YOU MI6UT TRY 
USIN6 THEM ON A SIMPLER 
QUESTION: WHAT'S THE PROBABILITY 
OF ROLLIN6 AT LEAST ONE SIX IN 
TWO ROLLS OF A SIN6LE PIE?) WE 
NEEP MORE MACHINERY! 


SO WE INTROPUCE 


conditional 

probability 

(AN ESSENTIAL CONCEPT IN 
STATISTICS'; 


SUPPOSE WE ALTER OUR EXPERIMENT SLIGHTLY. ANp THROW THE WHITE PIE 
BEFORE THE BLACK PIE. WHAT'S THE PROBABILITY THAT THE FACES SUM TO 3? 


NOW SUPPOSE THE 
WHITE PIE COMES 
UP 1 (EVENT £). 
WHAT S THE 
PROBABILITY 
OF A NOW? 




PROBABILITY IS 


p(a;- 


BEFORE THE PICE 
ARE THROWN, THE 
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wg call it rug 
cohpitiohal 

PROBABILITY THAT EVENT 
A WILL OCCUR, cwen 
THg CONPITION THAT 

event c has alreapy 
occurrep wg wRiTg 


ANP SAY *THg 
PROBABILITY OF A, 

& ve a/ c.- 


BEFORE ANY PICg wgRg THROWN, THg SAMPLE SPACE HAP 36 OUTCOMES, BUT 
NOW THAT THE EVENT C HAS OCCURREP, THE OUTCOME MUST BELON6- TO THE 
REPUCEP SAMPLE SPACE C- 


□ □ 

D U 

f 

IN THE REPUCEP SAMPLE SPACE OF SIX ELEMENTARY OUTCOMES. ONLY ONE 
OUTCOME 0,2) SUMS TO 3 SO THE CONPITIONAL PROBABILITY IS 1/6. 


IN 6ENERAL, TO FINP 
THE CONPITIONAL 
PROBABILITY P(GlF). 
WE LOOK AT THE 
EVENT E ANP F AS 
PART OF THE REPUCEP 
SAMPLE SPACE F. 





| • j 


B 9 B8 S 













WE TRANSLATE THIS 
INTO A FORMAL 
PEFINITION THE CONPITIONAL 
PROBABILITY OF C, &VCH F. IS 

P(EIF) 

P(H 

FROM WHICH yOU CAN PIRECTLy 
VERIFY SOME INTUITIVE FACTS: 

p^ie; ~ i CONCE E OCCURS, 
rrs certain.; 



WHEN e ANP F ARE MUTUALLy 
EXCLUSIVE, 


p(eif; ~ o 


(ONCE F HAS 
OCCURREP. E IS 
impossible; 



REARRANGING THE PEFINITION GIVES US THE MULTIPLICATION RULE- 

P(E AND F) = P(E I F)P(F) 

WHICH WE WOULP LIKE TO REPUCE TO A "SPECIAL" MULTIPLICATION RULE, 
UNPER THE FAVORABLE CIRCUMSTANCES THAT PCEIF; * ?(£). THAT WOULP BE 
EXCELLENT.' 



T ANp ViHILE VOd'RE N 
WAITING FOR THE 
NEXT PAfoE, NOTE THAT 
5VJAPPIM6 E ANP F 

PROVES THAT 

?(f)' > (e|f)--p(e)p(f|e) 


+1 






INDEPENDENCE and the 
special multiplication rule. 

TWO EVENTS E ANP F ARE INPEPENPENT OF EA£H OTHER IF THE 
0(CURREN££ OF ONE HAS NO INFLUENCE ON THE PROBABILITY OF THE 
OTHER FOR INSTANCE. THE ROLL OF ONE PIE HAS NO EFFECT ON THE ROLL 
OF ANOTHER (UNLESS THEY’RE 6LUEP TOGETHER. MAGNETIC. ET Cl). 




IN TERMS OF CONPlTlONAL PROBABILITY, THIS AMOUNTS TO SAYIN6 
?(£) - P(G\F) OR, EQUIVALENTLY. P(F) * P(FlQ). WHEN E ANP F ARE 
INPEPENPENT. WE 6ET A SPECIAL MULTIPLICATION RULE- 

P(E AND F) = P(E)P(F) 


LET’S VERIFY THE INPEPENPENCE OF PICE. USIN6- THE FORMULAS. CT IS THE 
EVENT WHITE PIE COME* UP li P IS THE EVENT BLACK PIE COME* UP t, ANP 
WE HAVE- t 

*!■>) ■ ? -^ - J■ i ■ WO 

BUT THE WHITE PIE SHOWING 1 OBVIOUSLY POES AFFECT THE CHANCES THAT 
THE SUM OF THE TWO PICE IS V. 

row.J«*L 0 -- 2 - -4- 4 ?m - J- 

p^c; p(c; 18 

so THESE TWO EVENTS ARE NOT INPEPENPENT. 


4S 




BEFORE CrO\UO, ON, LET* SUMMARIZE ALL THE RULES WEVE ACCUMULATEP: 


APPITION RULE; 

P(E OR F) = P(E) + P(F) - P(E AND F) 


SPECIAL APPITION RULE: WHEN E ANP F ARE 
MUTUALLy EXCLUSIVE, 

P(E OR F) = P(E) + P(F) 

SUBTRACTION RULE: 

P(E) = 1 - P(NOT E) 

MULTIPLICATION RULE: 

P(E AND F) = P(E I F)P(F) 

SPECIAL MULTIPLICATION RULE WHEN E 
ANP F ARE INPEPENPENT, 

P(E AND F) = P(E)P(F) 



ANP NOW, PE MERE'S PROBLEM AT LAST... LET E BE THE EVENT OF 6ETTIN6 
AT LEAST ONE SIX IN FOUR ROLLS OF A SIN6-LE PIE. WHAT'S V(B)? THIS IS 
ONE OF THOSE EVENTS WHOSE NEGATIVE IS EASIER TO PESCRIBE: NOT £ IS 
THE EVENT OF ££TTIN£ NO *!X£* IN FOUR THROW*. 



IF A, IS THE EVENT, 6ETTIN6. NO 
SIX ON THE I th THROW. WE KNOW 
THAT PCAj \ WE ALSO KNOW 
THAT ROLLS ARE INPEPENPENT, SO 


A/lULTiPtKATiON 
ftULt “ 


SO 


p^mot b) = 

pca, amp a 2 amp a, amp a 4 ; 



?(B) = t - PCMOT B) » .510 




NOW THE SECONP HALF LET F BE THE EVENT, 6.ETTIN6- AT LEAST ONE 
POUBLE SlX IN 24 THROWS A&AIN, NOT F 1$ EASIER TO PESCRIBE. IT’S THE 
EVENT OF 6.ETTIN6- NO POUBLE SIXES. 


r 




v 



IF B, IS THE EVENT, NO POUBLE 
SIX IS THROWN ON THE I™ 
ROLL. THEN NOT F = B, ANP B 2 
ANP._ B 24 . THE PROBABILITY OF 
EACH B IS 

PCB,) * U .50 

PfNOT F; * = .509 

fBY THE MULTIPLICATION RULE; 
ANP WE CONCLUPE THAT 

PfF; » t - P(NOT F; * 1 - .509 

- .491 




PE MERE TOLP PASCAL HE HAP ACTUALLY OBSERVEP THAT EVENT F OCCURREP 
LESS OFTEN THAN EVENT E. BUT HE WAS AT A LOSS TO EXPLAIN WHY... FROM 
WHICH WE 60HCiV^£ THAT PE MERE 6-AMBLEP OFTEN ANP KCPT CARtFUL 
RCCORP6H 
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BAYES THEOREM and the 

case off the false positives... 

FOR A MORE SERIOUS APPLICATION OF 
CONPITIONAL PROBABILITY, LET’S ENTER 
AN ARENA OF LIFE ANP PEATH- 



SUPPOSE A RARE PISEASE INFECTS ONE OUT OF EVERY WOO PEOPLE IN A 
POPULATION... 



ANP SUPPOSE THAT THERE IS A 600P, BUT NOT PERFECT, TEST FOR THIS 
PISEASE IF A PERSON HAS THE PISEASE, THE TEST COMES BACK POSITIVE 99% 
OF THE TIME ON THE OTHER HANP, THE TEST ALSO PROPUCES SOME FAL*C 
PO*mV£*. ABOUT 2% OF UNlNFECTEP PATIENTS ALSO TEST POSITIVE. ANP YOU 
JUST TESTEP POSITIVE WHAT ARE yOUR CHANCES OF HAVING THE PISEASE? 





LET'4 


VAJAY 


TUl^ 


IT 


■^HoUUP l 
PM IM AC*/A»l£E 


4fa 




WE HAVE TWO EVENTS TO WORK WITH- 


A s PATIENT HAS THE PISEASE 
B ; PATIENT TESTS POSITIVE- 

THE INFORMATION ABOUT THE TEST’S 
EFFECTIVENESS CAN BE WRITTEN 


P(A) » .001 

CONE PATIENT IN 1000 HAS THE PISEASE) 

P(0 1 A) « -99 

(PROBABILITY OF A POSITIVE TEST. 

GIVEN INFECTION. IS 99) 

?(B 1 WOT A) = -02 

(PROBABILITY OF A FALSE POSITIVE, GIVEN 
NO INFECTION, IS .02) 

AHP W£ APK 


P(AlB) » WHAT? 

(PROBABILITY OF HAVING THE PISEASE, 
GIVEN A POSITIVE TEST,) 



SINCE THE TREATMENT FOR THIS PISEASE HAS SERIOUS SIPE EFFECTS, THE 
POCTOR, HER LAWYER, ANP HER LAWYER'S LAWYER CALL ON JOE BAYES, CP 

(Consulting probabilist;, for an answer joe perives a theorem first 

PROVEP BY HIS ANCESTOR, THE REV THOMAS BAY& (1744-1609), 
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{ 


JOE BEGINS WITH A 2X2 TABLE, WHICH PIVIPES THE SAMPLE SPACE INTO FOUR 
WUTUALLy EXCLUSIVE EVENTS. IT PlSPLAyS EVERy POSSIBLE COMBINATION OF 
PlSEASE STATE ANP TEST RESULT 



A 

NOT A 


B 

NOT B 

A ANP B 

A ANP NOT B 

NOT A ANP B 

NOT A ANP NOT B 


LET'S FlNP THE PROBABILITIES OF EACH EVENT IN THE TABLE: 


A 

NOT A 

SUM 

B 

NOT B 

pca anp b; 

PCA ANP NOT B) 

PCNOT A ANP B; 

PCNOT A ANP NOT B) 

pcb; 

pcnot b; 


pca; 

pcnot a; 

1 


THE PROBABILITIES IN THE MARGINS ARE FOUNP By SUMMING ACROSS ROWS 
ANP POWN COLUMNS. 


V&rlAlTtou^ 


NOW COMPUTE: 




pca amp ® pcbiama; = (.99X.00V = .00099 

PCMOT A AMP B) » PCBlMOT AMMOT A} = COlX- 999 ) * .01990 

ALLOWING US TO FILL IN SOME ENTRIES: 



A 

NOT A 

SUM 

B 

.00099 

.01990 

.02097 

NOT B 

PCA ANP NOT B) 

PCNOT A ANP NOT B ) 

pcnot b; 


.001 

999 

1 


WE FlNP THE REMAINING PROBABILITIES By SUBTRACTING IN THE COLUMNS, THEN 
APPING ACROSS THE ROWS. 


Ai 





THE FINAL TABLE I* 



A 

NOT A 


B 

.00099 

01990 

.02097 

NOT B 

■OOOOI 

■97902 

.97903 


.001 

■999 

1 


P(A) 

PCNOT a; 



p(b; 

PfNOT B) 


FROM WHl£H WE PIRE6TLy PERIVE 


PCAIb; * P ^ A ~~ ^ - -00099 
?M ' .02097 


.0472 


PESPITE THE HI6-H UtVRbCY OF THE TEST, L£SS TWA A/ S% OF THOSE WHO 
TEST POSITIVE A^TUALLV 1 HAVE THE PISEASE/ THIS IS 6ALLEP THE FAL*C 
potmvc PARAPOX. 


PARADOX 

AUCP 

PA\(?-A- 

LfONV^ 




THIS TABLE SHOWS 
WHAT HAPPENS IN A 
6-ROUP OF A THOUSANP 
PATIENTS. ON AVERAGE, 
ONLy 21 PEOPLE WILL 
TEST POSITIVE—ANP 
ONLy ONC OF THEM 
HAS THE PISEASE' 20 
FALSE POSITIVES 60ME 
FROM THE MUOH 
LARGER UNMFCGTCP 
GROUP. 



PISEASE 

NO PISEASE 


TESTS 

POSITIVE 

1 

20 

21 

TESTS 

NEGATIVE 

0 

979 

979 


1 

999 

lOOO 
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WHAT'S THE PHySlCIAN TO PO? JOE BAyES APVISES HER WOT TO START 
TREATMENT ON THE BASIS OF THIS TEST ALONE THE TEST POES PROVIPE 
INFORMATION. HOWEVER- WITH A POSITIVE TEST THE PATIENT'S CHANCE OF 
HAVING THE PISEASE INCREASEP FROM 1 IN I OOO TO 1 IN 23. THE POCTOR 
FOLLOWS UP WITH MORE TESTS 



JOE BAyES COLLECTS HlS CONSULTING CHECK" BEFORE APMITTING THAT ALL 
THOSE STEPS HE WENT THROUGH CAN BE COMPRESSEP INTO THE SINGLE 
FORMULA CALLEP BAyES THEOREM: 


P(AIB) 


_ P(A)P(B I A) _ 

P(A) P(B I A)+P(NOT A) P( B I NOT A) 



rr computes p^aib; from p^a; anp the two conpitional probabilities 
pcbia; anp pcbInot a;, you can perive it By noting that the big fraction 

CAN BE EXPRESSEP AS 


P(A and B) _ P(A and B) _ t 

P(A and B)+P(NOT A and B) ” P(B) “ 1 
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IN THIS CHAPTER, WE COVERED THE 
BASICS OF PROBABILITY: ITS DEFINITION, 
SAMPLE SPACES AND ELEMENTARY 
OUTCOMES, CONDITIONAL PROBABILITY, 
AND SOME BASIC FORMULAS FOR 
COMPUTING PROBABILITIES. WE 
ILLUSTRATED THESE IDEAS USING A 
2-DICE SAMPLE SPACE. FOR THE MODERN 
RAMBLER, PROBABILITY IS THE POWER 
TOOL OF CHOICE 



HHHHH1 


iEDBnH 

QE QE EE EE EE EE 

gSHHOB 

B388BSBHEBB 


rm nn [ttj rrri rm [ T T) 
^ 2 ] ^23 


AND FINALLY, IN THE MEDICAL EXAMPLE. WE SHOWED HOW THESE ABSTRACT 
IDEAS COULD HELP TO MAKE GOOD DECISIONS IN THE FACE OF IMPERFECT 
INFORMATION AND REAL RlfK *-THE ULTIMATE GOAL OF STATISTICS- 


(HI 







BUT THIS IS JUST THE BEGINNING. FOR US, PROBABILITY IS ONLY A TOOL —AN 
ESSENTIAL TOOL, TO BE SURE-lN THE STUDY OF STATISTICS. IN THE CHAPTERS 
THAT FOLLOW, WE LL EXPLORE THE SUBTLE RELATIONSHIP BETWEEN 
PROBABILITY, VARIATIONS IN STATISTICAL DATA. AND OUR CONFIDENCE IN 
INTERPRETING THE MEANING OF OUR OBSERVATIONS- I 




SI 





♦ Chapter 44 

RANDOM VARIABLES 


IN CHAPTER 2, WE SAW THAT OBSERVATIONS OF NUMERICAL 
PATA, LIKE STUPENTS' WEIGHTS, CAN BE 6RAPHEP ANP 
SUMMARIZEP IN TERMS OF MIPPOINTS, SPREAPS, OUTLIERS, ETC. 
IN CHAPTER 3, WE SAW HOW PROBABILITIES CAN BE ASSICNEP 
TO THE OUTCOMES OF A RANPOM EXPERIMENT 



IF WE IMAGINE A RANPOM EXPERIMENT REPEATEP MANY TIMES, 
WE EXPECT THAT THE ACTUAL OUTCOMES OVER TIME WILL BE 
&OVERNEP By THEIR PROBABILITIES THE PROBABILITIES FORM A 
MOPCL FOR REAL-LIFE EXPERIMENTS... SO WHY NOT PO FOR THE 
MOPEL WHAT WEVE ALREAPY PONE FOR THE PATA IT PESCRIBES? 



f THE KEY I PEA 15 THE RANPoM VARIABLE. WHICH WE 



A RAW POM VARIABLE 15 PEFIWEP AS THE NUMERICAL OUTCOME OF A 
RANPOM EXPERIMENT. 

V_/ 


FOR EXAMPLE, IMAGINE PRAWIN& OWE STUPEWT AT RAWPOM FROM THE 
STUPEWT BOpy. THAT'S THE RAWPOM EXPERIMEWT. THE STUPEWTS HEIGHT, 
WEIGHT, FAMILY INCOME, *A.T. 4CORE, ANP GRAPE POINT AVERAGE ARE 
ALL NUMERICAL VARIABLE 5 PE SCRIBING PROPERTIES OF THE RAN POM LY 
SELECTEP STUPEWT THEY'RE ALL RANPOM VARIABLE 5- 



ANOTHER EXAMPLE; TOS5 TWO COINS (THE RAWPOM EXPERIMENT; ANP RECORP 
THE NUMBER OF HEAPS; O. I. OR 

OUTCOME 
X 


NOTE THE NOTATION! THE VARIABLE IS WRITTEN WITH A CAPITAL X. THE 
LOWERCA5E * REPRESENTS A SINGLE VALUE OF X, FOR EXAMPLE ^*2, IF 
HEAPS COMES UP TWICE 
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■yilP! THAT'S 
WHV l 
UP PlClN'' . 


ANOTHER EXAMPLE IS 
8ASEP ON THE FAMILIAR 
TOSS OF TV/O PICE. LET 
y REPRESENT THE SUM 
OF THE POTS ON THE 
TWO PICE FOR THIS 
RANPOM VARIABLE, y 
CAN BE ANy NUMBER 
BETWEEN 2 ANP 12. 


Y =7 


NOW WE WANT TO LOOK AT THE PROBABILITIES OF THE OUTCOMES. FOR 
THE PROBABILITY THAT THE RANPOM VARIABLE X HAS THE VALUE X, WE 
WRITE Pr(X * X). OR JUST pCXl FOR THE COIN-FLIPPING RANPOM 
VARIABLE X, WE CAN MAKE THE TABLE: 


FOR THE RANPOM VARIABLE Y (THE SUM OF TWO PICEJ, THE PROBABILITY 
PISTRIBUTION LOOKS LIKE THI* 


X 

0 

1 2 

Pr(X~x) 

1 

4 

_1 _L 

2 4 


THIS TABLE IS 
CALLEP THE 
PROBABILITY 
PISTRIBUTION OF 
THE RANPOM 
VARIABLE X- 


m 










O 1 2 

ITS EASY TO SEE THAT THE TOTAL AREA OF THESE BOXES (S 1= EACH BOX HAS 
BASE 1 AN 17 HEIGHT ^Ot), SO THE TOTAL AREA fS THE SUM OF THE 
PROBABILITIES OF ALL OUTCOMES, I E- t. 

V__ 


' NOW LET’S t?RAW 6-RAPHS. OR HI5T06RAM*, SHOWING THESE 
PROBABILITY DISTRIBUTIONS. FOR EACH VALUE OF X, WE DRAW A BAR 
EQUAL IN HEIGHT TO pCX"). 
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HERE'S THE PROBABILITY HISTOGRAM OF THE RANDOM VARIABLE Y, SHOWING 
THE PROBABILITY DISTRIBUTION OF THE SUM OF TWO DICE: 



Sfc 




r WHy PO VIE CALL THESE 6RAPHS HISTOC-RAMS? YOWLL RECALL THAT IN 
CHAPTER 2 , A HISTOGRAM WAS A GRAPH THAT PISPLAyEP HOW MANy PATA 
POINTS Uy IN EACH OF A SERIES OF INTERVALS: 



Weight in Pounds 


from this frcqucncy histoc-ram, we perivep the rclativc frequency 

HISTOGRAM, SHOWING THE PROPORTION OF PATA IN EACH INTERVAL: 



Weight in Pounds 


BUT yoU LL RECALL THAT, By 
ONE PEFINITION, PROBABILITY 
IS THE RCLATIVC FRCQUCNCY 
OF AN CVCNT “IN THC 
LON6 RUN" IF WE REPEAT 
THE RANPOM EXPERIMENT 
MANy TIMES, THE RCLATIVC 
FRCQUCNCY HISTOGRAM OF 
THE OUTCOMES SHOULP COME 
TO LOOK VERy MUCH LIKE 
THE RANPOM VARIABLE’S 
PROBABILITY HISTOGRAM' 



WE ILLUSTRATE USINO THE RAM POM 
VARIABLE X AMP A MAP COIN TOSSER. 



THE TOSSER BEOINS FLIPPING TWO 
COI NS REPEATEPLy. KEEPING TRACK 
OF THE RESULTS. 



WE KNOW X S PROBABILITY PISTRIBIH’ION, AMP WE ALSO KNOW THAT THE 
ACTUAL COIN FLIPS WILL MATCH THE PROBABILITIES APPROXIMATELY AFTER 
WOO TOSSES, THE MAP TOSSER TALLIES HER PATA 


PROBABILITY 

MOPEL 


OBSERVEP PATA 


PCX') 

X 

NUMBER OF 
OCCURRENCES 

/7 

7 ^ * RELATIVE 
FREQUENCY 

.29 

O 

260 

.260 

.5 

1 

917 

.917 

.29 

2 

229 

.229 


AMP WE SEE THAT THE PROBABILITY HISTOGRAM OF X LOOKS LIKE THE “PURE 
FORM" OR MOPEL OF THE RELATIVE FREQUENCY HISTOGRAM OF THE PATA 
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TO EXTENP THE ANALOGY BETWEEN RELATIVE FREQUENT ANP PATA, WE 
DHOULP NOW BE WILLING TO TALK ABOUT THE MEAN ANP VARIANCE (OR 
DTANPARP PEVIAT/ON; OF A PROBABILITY PIDTRlBUTlON... 



ANP JUDT TO REMINP 
OURDELVED THAT WE RE IN 

the realm OF THE 

ABSTRACT, WE BREAK OUT 
DOME 6REEK LETTER*-. 


MEAN AND VARIANCE OF 
RANDOM VARIABLES 


WE UDE DPEOAL TERMINOLOGY 
ANP DYMBOLD TO PIDTINGUIDH 
BETWEEN THE PROPERTIED OF 
PATA DETD ANP PROBABILITY 
PIDTRIBUTION* 



PROPERTIED OF PATA ARE (ALLEP SAMPLE PROPERTIED, WHILE PROPERTIED 
OF THE PROBABILITY PIDTRIBUTION ARE CALLEP MOPEL OR POPULATION 
PROPERTIED. WE UDE THE GREEK LETTER M (MW FOR THE POPUUTION 
MEAN, ANP a (LOWERUDE DIGMA) FOR THE POPUUTION DTANPARP 
PEVIATION. (FOR PATA, WE UDE THE ROMAN DYMBOLD % ANP *) 












THE SAMPLE MEAN WAS PEFINEP 
By THE EQUATION 





GOOT> { sow UET'5 

TW/S-T IT AROUND/ 


Cxi* Joi \ 


NOW SOME OF THESE PATA POINTS Z t MAy WELL HAVE EQUAL VALUES. THINK 
OF THE MAP COIN TOSSER THE ONLy AVAILABLE VALUES WERE O, 1 . ANP 2 , ANP 
SHE MAPE \000 TOSSES. THE VALUE O WAS TAKEN ON 2 60 TIMES, t HEAP CAME 
UP 517 TIMES, ANP 2 HEAPS, 22? TIMES. 


AS WE LET Z RAN6-E OVER 
ALL VALUES OF X, CALL n z 
THE NUMBER OF PATA 
POINTS WITH THE VALUE Z■ 
THEN WE CAN REWRITE 
THAT FORMULA AS 


&ECAU46 EACH 
l£ COUFTTEP 
XL* . 


* 

ailx 




AH.' BUT NOW IS THE RELATIVE FREQUENCY- THE “APPROXIMATE 
PROBABILITY..." THE NUMBER THAT APPROACHES p(Z)*0, By ANALOGY WE 
FORM THE EXPRESSION 


2_jXp(X) 


all# 


ANP PEFINE THAT AS THE 
MEAfJ OF THE PROBABILITY 
PlfTRIBUTlOH. 



i/EP 




PCFIHITIOH: THE 

mean of the 

RANPOM VARIABLE X 15 
PEFINEP A5 


M - ^ %p(%) 

all* 

THI5 15 AL50 CALLEP THE CXPCCTCP VALUE OF X, OR E[x]. THINK OF IT A5 
THE 5UM OF THE P055IBLE VALUE5. EACH WEI6HTEP By IT5 PROBABILITY 
V___/ 

THE MAP COIN T055ER'5 EXPERIMENT ALLOW5 U5 TO COMPARE HER 5AMPLE 
MEAN X WITH OUR MOPEL MEAN p 


5AM PLE MOPEL 


X 

n x 

n 

x & 

* n 

X 

p(X) 

xp(x) 

o 

.26 

0 

O 

.29 

o 

t 

.917 

.917 

1 

.9 

.9 

2 

.223 

.446 

2 

.29 

9 


.953 = X 1 * M 




all X 


NOW LET’5 PO THE 5AME THIN6- TO 
THE VARIANCE. MAYBE YOU 
REMEMBER THE FORMULA 

n 

i-l 


IT (ALMOST) MEA5URE5 THE AVERAGE 
5QUAREP PI5TANCE OF PATA FROM THE 
MEAN. A5 ABOVE THI5 CAN BE REWRITTEN 
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EXCEPT FOR THAT ANNOyiNC- PENOMlNATOR n-1 INSTEAP OF n. THIS ALSO 
LOOKS LIKE A WEl&HTEP SUM OF SQUAREP PI5TANCES.. SO WE MAKE ANOTHER 
DEFINITION: 


the variance 

OF A RANPOM VARIABLE X 
IS THE EXPECTEP SQUAREP 
PISTANCE FROM THE 
POPULATION MEAN. 


o - 1 ~Yj (x ' m) 1 P (x) 

all x 

the standard 
deviation a 

IS THE SQUARE ROOT 
OF THE VARIANCE 



p o you SEE 

THAT a* IS THE 
SAME AS 

E[(X.-a) 2 }? 




WE USE THE TABLE 
FROM THE LAST 
PA6E TO FINP THE 
VARIANCE OF THE 
TWO-COIN TOSS 
(FOR WHICH M 3 0. 


X 

PCX') 

(X-M'fpCX') 

0 

.2* 

(O-V*. 25 = .25 

1 

.5 

(l-0 2 5P = 0 

2 

.25 

(2-0*25 = .25 


TOTAL -50 = £T 2 



TO SUM UP: M ANP tr, THE POPULATION MEAN ANP 6TANPARP PFVIATION^AUE 
PROPERTIES WE CAN COMPUTE FROM PROBABILITY PlfTRIBUTIONf. TUEY ARE 
COM PLETELy ANALOGOUS TO THE SAMPLE MEAN X ANP STANPARP PEVIATION s 
COMPUTEP FROM SAMPLE PATA. 
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OUR EXAMPLES SO FAR HAVE BEEN PIMRGTG RANPOM VARIABLES THEIR 
OUTCOMES ARE A SET OF ISOLATEP CPISCRETE"? VALUES, LIKE THOSE WE SAW 
IN CHAPTER 3, BUT THERE ARE ALSO 


Continuous 

Random 

Variables 

LET’S IMAGINE A RANPOM EXPERIMENT 
IN WHICH ALL OUTCOME* HAVE 
PROBABILITY ZERO. THAT'S RI6HT, 
p(X) * O FOR EVERY X- 



-- 

A SIMPLE EXAMPLE IS A BALANCER SPINNING POINTER. IT CAN STOP ANyWHERE 
IN THE CIRCLE IF X REPRESENTS THE PROPORTION OF THE TOTAL 
CIRCUMFERENCE IT LAN PS ON, THE RANPOM VARIABLE X CAN TAKE ON ANy 
VALUE BETWEEN O ANP 1-AN INFINITE RAN&E OF VALUES. 



SOME PROBABILITIES ARE EASy TO 
FINP, LIKE THE PROBABILITY THAT X 
FALLS WITHIN A RAN6E FOR 
EXAMPLE, PrC 25 $ X < 75 ) » . 5 . 
BECAUSE IT’S HALF THE CIRCLE. BUT 
WHAT ABOUT Pr(X - .5)? SINCE X 
CAN TAKE ON AN INFINITE MUMPER 
OF VALUES, ANP ALL OF THESE 
VALUES ARE EQUALLY LIKELY, THE 
PROBABILITY THAT X IS EXACTLY 5 
(OR EXACTLY ANYTHING (S 
PRECISELY O. 


o 
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HOW CAN WE PRAW A PICTURE OF THIS? 
BY ANALOGY WITH THE CASE OF 
PISCRETE PROBABILITIES, WE TRy TO 
SEE CONTINUOUS PROBABILITIES AS 
UNPER 6 OMCTUIN 6 . FOR THE 
SPINNING POINTER, THE ‘SOMETHING* 
LOOKS LIKE THIS- 




/fa) = O WHEN X < O 
/fa) * 1 WHEN 0 $ % « 1 
/fa) * 0 WHEN X > 1 

X 


H 



THE PROBABILrry THAT THE 
POINTER POINTS ANyWHERE 
BETWEEN a ANP b IS PRECISELY 
THE AREA OF THE SHAPEP REGION 
UNPER THE CURVE BETWEEN * ANP 
b ON THIS CASE, b-a) 





X 



THE PROBABILITY OF AN EXACT 
OUTCOME, HOWEVER, IS THE ’AREA’ 
OVER A POINT, WHICH IS ZERO- 
(ANP NOTE THAT THE TOTAL AREA 
UNPER THE CURVE IS EXACTLY 1-) 




THE SAME PICTURE PESCRIBES THE RAM POM NUMBER GENERATOR FOUNP ON 
MOST COMPUTERS ANP SOME CALCULATORS. PRESS THE BUTTON, OUT POPS A 
NUMBER BETWEEN o ANP 1 , ANP ALL THE NUMBERS ARE EQUALLY LIKELY. JUST 
AS WITH THE SPINNING POINTER 



?unch 

Punch 


BUT SAPLy, THEY AREN'T 
TRULY RAN POM THEY RE 
PROPUCEP BY SOME 
ALGORITHM, SO. TO BE 
ACCURATE. WE CALL THEM 
PiEVPO-RANPOM NUMBERS. 


THE CURVE y. - -Fix') IN THIS 
EXAMPLE IS CALLEP THE 
PROBABILITY PEHflTY OF THE 
CONTINUOUS RANPOM VARIABLE X- 
EVERY CONTINUOUS RANPOM 
VARIABLE HAS ITS OWN PENSITY 
FUNCTION. THE PROBABILITY 
Pr(a $ X $ />; IS THE AREA 
UN PER THE CURVE BETWEEN THE 
*-VALUES ci ANP b. 



6? 






IN GENERAL, TME PROBABILITY 
PENSfTY WON’T BE SO SIMPLE, 
ANP COMPUTIN6 TME AREAS CAN 
BE FAR FROM TRIVIAL 



WE NAVE TO USE CALCULUS 
NOTATION TO PESCRlBE TME 
AREA UNPER TME CURVE 
TMIS SYMBOL IS REAP ‘TME 
INTEGRAL OF ~f FROM a TO />.” 



LIKE PISCRETE PROBABILITIES, 
CONTINUOUS PENSITIES MAVE 
TWO FAMILIAR PROPERTIES; 


/fa) ^ o 


\ 





(TRY NOT TO BE ALARMEP BY TMOSE 
INFINITIES- TMEY JUST MEAN WE RE 
LOOKING AT TME TOTAL AREA UNPER 
TME CURVE FROM ENP TO ENP, 
EXCEPT TMAT TMERE IS NO ENP?; 
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ADDING 

random variables 

ONCE you KNOW THE MEAN ANP 
VARIANCE OF A RAN POM VARIABLE. 
WHAT CAN yOU PO WITH THEM? 
WELL, FOR ONE THIN6-, yOU CAN 
FINP THE MEAN ANP VARIANCE OF 
*OME OTHCR RANPOM VARIABLE*.. 




OW- 

-THAT 5ooNPS 

-• 


FOR EXAMPLE, LOOK AT A FAIR COIN TO** LET X » 1 IF THE COIN COME* UP 
HEAP* ANP O IF IT COME* UP TAIL*. 


p(%) 




By now, you *houlp be able 

TO FINP THE MEAN 

G[X] = 0'p(0) = \-pi\) 

- O + .5 


ANP THE VARIANCE 

a 1 = (0-.t) 2 p(o) +■ 0-.5?p(0 

» .25 



NOW LET'* PUy A *IMPLE 6AMBLIN& CAME: yOU ANTE UP *6 00 TO PLAtt I 
FLIP A COIN. yOU WIN $10 IF THE COIN COME* UP HEAP*. ZERO IF TAIL* THEN 
yOUR WINNING* W ARE 

W = wX - 6 - ^ 




A NEW RANPOM VARIABLE' 
WHAT ARE IT* MEAN ANP 
VARIANCE? 








IN GENERAL. IT 15 NOT HARP 
TO 5 HOW THAT 

e[«x+*] * «e[x]+/> 

WHEN a ANP £ ARE ANy 
NUMBER5 ANP X 1$ ANy RANPOM 
VARIABLE. FOR THE VARIANCE, 
THERE'5 AL50 A GENERAL 
RE5ULT; 

cr 1 (aX+b') =• a 2 cr 2 (X) 


4-c r—» 



AEW+t 



6 ? 







you £AN ALSO APP TWO RAW POM VARIABLES TOGETHER. FOR INSTA NdE. SUP¬ 
POSE WE TOSS A COIN TWICE. TME NUM8ER OF NEAPS OKI BOTM TOSSES IS 
Xj+X^. WNERE X, AMP x 2 ARE THE RANPOM VARIABLES GIVING THE RESULTS 
OF THE FIRST AMP SE^ONP TOSSES. 


.5 .25 


AGAIN, IT’S EASy TO SEE THAT 

C[x,+x 2 ] = e[x,]+e[x 2 ] 


(POUT ASK ABOUT THE PROBABlLlTy PlSTRlBUTlOKJ OF X,+X 2 , BECAUSE IT 
PEPENPS IN A £OMPL«:ATEP WAy ON THE TWO ORIGINAL PlSTRlBUTlONS- FOR 
EXAMPLE, IF X, ANP X 2 ARE BOTH THE SPINNING POINTER PlSTRIBUTION, THE 
HISTOGRAMS ACT LIKE THISO 







THE VARIANCE OF THE SUM OF RANPOM VARIABLES HAS A SIMPLE FORM IN 
THE SPECIAL ZASE WHEN THE VARIABLES X AN P Y ARE tNPGPGNPGNT. THE 
TECHNICAL PEFINITION OF INPEPENPEN^E IS BASEP ON THE PROBABILITY 
PROPERTY P^A ANP 9) - PCAPPCBX.. BUT FOR US. INPEPENPENCE JUST MEANS 
THAT X ANP Y ARE 6ENERATEP BY INPGPGHPGHT M££HAHI*M*. SUZH AS 
FLIPS OF A ZOIN, ROLLS OF A PIE. ETZ. 



ALL OF THIS CA N BE 6ENERALIZEP TO THE SUM OF MANY RANPOM VARIABLES: 


E 2>J * 

i—1 i-t 

ANP, WHEN THE X, ARE ALL INPEPENPENT, 

•r'Cj^Xi) - ^To-'CX,) 

i-l i-1 
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IN THE NEXT CHAPTER, WE WILL SEE TWO IMPORTANT EXAMPLES OF RAN POM 
VARIABLES: ONE, THE BINOMIAL, IS THE SUM OF MANy REPEATED INPEPENPENT 
RANPOM VARIABLES. THE OTHER, THE NORMAL, IS A CONTINUOUS RANPOM 
VARIABLE THAT HAS A SURPRISING RELATIONSHIP TO THE BINOMIAL, ANP ANy 
OTHER SUM OF INPEPENPENT RANPOM VARIABLES AS WELL 





♦ Chapter 5 ♦ 


A TALE OF TWO 
DISTRIBUTIONS 


NOW WE look: at two important example* of 

RANPOM VARIABLE*. ONE P1*£R£TE ANP ONE ^ONTINUOU* 


WE 0E6-IN WITH THE PISCRETE OWE, CALLEP THE BINOMIAL RAW POM VARIABLE 
SUPPOSE WE HAVE A RANPOM PROCESS WITH JUST TWO PO**IBLE OUTCOME*: 
A HEAPS OR-TAILS COIN TOSS, A WlN-OR-LOSE FOOTBALL (SAME, A PASS-OR- 
FAIL AUTOMOTIVE SMO& INSPECTION. WE ARBITRARILY CALL OWE OF THESE 
OUTCOMES A *UCCE** AWP THE OTHER A FAILURE. 



WHAT WE PO IS TO REPEAT THIS EXPERIMENT... WELL, REPEATEPLY. SUCH A 
REPEATABLE EXPERIMENT IS CALLEP A 



f NO 
PICTURE 
„ OF 
BE^JOULLV 


Bernoulli 

trial. 


PROVIPEP IT HAS THESE CRITICAL 
PROPERTIES 


t; THE RESULT OF EACH TRIAL 
MAY BE EITHER A SUCCESS OR 
A FAILURE 


Z) THE PROBABILITY p op 
SUCCESS IS THE SAME IN 
EVERY TRIAL 


V THE TRIALS ARE INPEPENPENT i 
THE OUTCOME OF ONE TRIAL HAS 
WO INFLUENCE ON LATER OUTCOME 


74 






STARTING WITH A BERNOULLI TRIAL, WITH PROBABILITY OF SUOJESS p, LET'S 
BUlLP A NEW RANPOM VARIABLE BY REPEATING THE BERNOULLI TRIAL 


The 

binomial 

random 

variable 

X IS THE HUMBCR OF 

IN n REPEATEP 
BERNOULLI TRIALS WITH 
PROBABILITY p OF SUO£SS 



HOW AAkH-V 
TIME* WILL 
I PA** 
n *M06 te*t<»? 


AN EXAMPLE OF A BINOMIAL RANPOM VARIABLE IS THE NUMBER OF HEAPS 
CSU^ESSES; IN TWO FLIPS OF A £OIN. HERE n ^2 ANP p * .5 


NUMBER 

OF SUO^ESSES 

o 

1 

2 


.25 

.5 

.25 



ANOTHER EXAMPLE IS PE MERE’S FIRST 6AMBLE- TOSSING A SINGLE PIE 
FOUR TIMES IN A ROW. SU^ESS MEANS ROLLING A 6 THE PISTRIBUTION IS; 



UM THE PlSTRl8UT<ON 




WHAT 14 TUP 
PRO&A^IUTY op 
UOLL1M6 £ (?'<> 

IH A Roll^ ? 






IN GENERAL, WHAT’S THE PROB- 

ABiLrry pistribution of the 

BINOMIAL FOR ANY PROBABILITY 
p ANP NUMBER OF TRIALS n? A 
PROBABILITY CAUULAT/OW 6-IVES 
THE ANSWER: THE PROBABILITY 
OF OBTAINING A SUCCESSES IN 
n TRIALS, Pr(X-Jk), IS 

Pr(X-Jd = COp^O-pT'* 



HERE REAP CHOOSE A‘ IS THE BINOMIAL COCFFItlCNT. IT COUNTS 
ALL POSSIBLE WAYS OF &ETTIN& A SUCCESSES IN n TRIALS. EACH INPIVIPUAL 
SEQUENCE OF A SUCCESSES ANP FAILURES HAS PROBABILITY p i (\-pY' i , 
BY THE MULTIPLICATION RULE. THERE ARE ( 4 ) OF THESE SEQUENCES. 



THE FORMULA FOR Q) IS 

© » liA. 

WHERE 

/7l =• n t(n-\)%(n~i)% „ xi 


ANP 0 \ IS TAKEN TO BE 1- FOR INSTANCE, 
(*), THE NUMBER OF POSSIBLE WAYS TO 
CHOOSE TWO LETTERS FROM A SET OF 
FOUR LETTERS, IS 




. 

AB Ac AD 

BC BD CD 


Tfc 





ANOTHER VIEW OF THE BINOMIAL COEFFICIENTS 1$ IN PASCAL'S TRIAH6LC. 
EACH ENTRY 1$ THE SUM OF THE TWO NUMBERS JUST ABOVE IT 



t IP 45 no 2)0 252 210 12P ^45 IP 1 * 

1 11 55 1A5 330 4A2 4A2 330 1A5 55 11 1 

1 12 £6 220 495 792 924 792 495 220 U 12 1 


ETC. 

TO FINP Q), JUST COUNT POWN TO ROW n ANP OVER TO ENTRY k 
(REMEMBERING ALWAYS TO START COUNTING FROM ZERO). 


WHEN p - .5. THE BINOMIAL S 
PROBABILITY PISTRIBUTION IS 
PERFECTLY SYMMETRICAL. FOR 
b COIN FLIPS. FOR INSTANCE. ITS 


k. =• # HEAPS o 1 2 3 4 5 A 



O \ 2 3^^ 6 
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FOR PE MERE-5 ROLL OF FOUR PI£E. THE PI5TRIBUTION 15 MORE LOP5IPEP 



( 

THE M£AN ANP VARIANCE OF THE 
BINOMIAL PI5TRIBUTION ARE 

M ~ 

cr % =• npO-p') 

NOTE THAT THE MEAN MAKE5 
INTUITIVE 5EN5E: IN /? BERNOULLI 
TRIAL5, THE EXPEOTEP NUMBER OF 
5U^E55E5 5HOULP BE 77 /?. THE 
VARIANCE FOLLOW5 FROM THE 
FA£T THAT THE BINOMIAL 15 THE 
5UM OF 7 ? INPEPENPENT BERNOULLI 
TRIAL5 OF VARIANCE p(\-p). 



THE PARAMETER5 OF THE BINOMIAL PI5TRIBUTION ARE 77 ANP p. THE 
PI5TRIBUTION, MEAN, ANP VARIANCE PEPENP OULY ON THE5E TWO NUMBER* 
TABLE5 OF THE BINOMIAL PI5TRIBUTION APPEAR IN M05T TEXTBOOK5 ANP 
COMPUTER PROGRAMS. HERE 15 A TABLE FOR 77 = 10 


VALUED OF ?r(X-k'> 

k 

0 1 234 56789 10 

1 0 349 0.387 0.194 0.057 0 011 0 001 0.000 0 000 0.000 0 000 0 000 

.25 0 056 0.188 0.282 0.250 0.146 0 058 0.016 0 003 0.000 0.000 0.000 

P .50 0.001 0.010 0 044 0.117 0 205 0.246 0.205 0.117 0 044 0 010 0 001 

.75 0.000 0.000 0 000 0.003 0 016 0 058 0.146 0.250 0.282 0 188 0.056 

.9 0 000 0.000 0 000 0.000 0 000 0 001 0 011 0.057 0.194 0 387 0 349 
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BUT CALCULATING 
THESE THINGS FOR 
LARGE VALUES OF n 
CAN BE A PAIN... OR AT 
LEAST. IT WAS BACK IN 
THE 10 ™ CENTURy, 
WHEN JAMCS 
BERNOULLI ANP 
ABRAHAM PC MOIVRE 
WERE TRyiNG TO PO 
IT WITHOUT A 
COMPUTER 




WE AGED ^ 
MEW TOOL^! 


OR VViPER 
PM>ER .. 





\< ao *“ '' 



PEPLOVING A NEWLY 
INVENTEP WEAPON, THE 
CALCULUS, PE MOIVRE 
SHOWEP THAN WHEN p =.5. 
THE BINOMIAL PISTRIBUTION 
WAS CLOSELy 
APPROXIMATEP By A 
CONTINUOUS PCNSITy 
FUNCTION WHICH COULP BE 
PESCRIBEP VERy SIMPLE 


TO SEE HOW THIS WORKS, IMAGINE THE BINOMIAL PISTRIBUTION WITH p - -5 
ANP /7 VERy LARGE—A MILLION, SA*. 



n 




MOW. 5AIP PEMOIVRE. SLIPE THIS 
GRAPH OVER. SO ITS MEAN IS ZERO 



SQUASH THE CURVE ALONG THE X AXIS 
UNTIL THE STANPARP PEVIATlON 
BECOMES I. WHILE STRETCHING IT 
ALONG THE £ AXIS TO KEEP THE AREA 
UNPER IT EQUAL TO 1. 



THE RESULT IS VERy CLOSE TO A SMOOTH. SYMMETRICAL, BELL-SHAPE? 
CURVE. WHICH PEMOIVRE SHOWEP WAS GIVEN W THE SIMPLE FORMULA: 


f(z)~ 


1 

e 2 


V27T 


THIS FUNCTION IS CALLEP THE 

standard normal 
distribution. 

(e is k useful mathematical 



(CONVINCE yOURSELF THAT THIS FUNCTION REALLy HAS A BELL-SHAPEP 
GRAPH- FOR z FAR FROM ZERO, f(z) IS VERy NEARLy ZERO-IT HAS A BIG 
PENOMINATOR, IT'S SyMMETRlCAL, SINCE f(z) - f(-z). ANP IT HAS A 
MAXIMUM AT z - 0) 

THE PISTRIBUTJON IS CALLEP THE 

stahparp normal because all 

THAT SQUASHING ANP STRETCHING 
WAS SPEClALLy ARRANGEP TO GIVE 
IT THESE SIMPLE PROPERTIES, 

WHICH WE PRESENT WITHOUT 
^ PROOF: _ 

Bo 


JU. ■=■ O 

cr ■=. 1 



















TO *UMMARIZE PE MOIVRE, 
IF YOU “NORMALUC” THC 
BINOMIAL PI*TRIBUTlON 
WITH - 1/2-1E-, CENTER 
IT ON ZERO ANP MAKE IT* 
*TANPARP PEVIATION * 1, 
THEN IT £LO*ELy FIT* 

THE BTANPARP NORMAL 
PtfTRIBUTION 

Z=- -=L=Q~ 2 

V2.7T 




OTHER NORMAL*, WITH PIFFERENT MEAN* ANP VARIANCE*. ARE OBTAINEP By 
*TRET£HIN£ ANP *LIPIN6- THE *TANPARP NORMAL IN GENERAL, WE WRITE THE 
FORMULA 


Fix l/U.cr) 



THI* 6(VE* A *YMMETRld, 
BELL'*HAPEP PI*TRl8UTION 

<:enterep on the mean m 

WITH THE *TANPARP 
PEVIATION cr. 


HERE ARE TWO PIFFERENT NORMAL* WITH THE REGION* WITHIN THEIR 
*TANPARP PEVIATION* *HAPEP. 



/j WITH *MALL cr t 


f, WITH LAR6E cr, 


ei 





PE MOIVRE PROVEP THAT THE 5TANPARP NORMAL FIT5 THE CNORMALIZEPJ 
BINOMIAL WITH p * .5, BlTT, IN FACT, IT WORK5 FOR AMY VALUE OF p. 


6GHGRALLY: FOR AHY 
VALUE OF p. THE 
BINOMIAL PI5TRIBUTION 
OF n TRIALS WITH 
PROBABILITY p 15 
APPROXlMATEP BY THE 
NORMAL CURVE WITH 
p *■ np ANP 
cr =- np(\ - p'). 



ALL B»N0K\IAlS 
TORH IKItO 

EvENTOALlV 




THI5 15 ACTUALLY A 
LITTLE 5TRAN6E ALL 
NORMAL5 ARE 
5YMMETRICAL ANP 
BELL 5HAPEP- BUT, A5 
WE 5AW, BINOMIAL 
PI5TRIBUTION5 ARE 
A JOT SYMMETRICAL 
WHEN p A.5. 


BUT IT TURN5 OUT THAT A5 n 6-ET5 LAR6E, THE BINOMIAL'5 A5YMMETRY 15 
OVERWHELMEP. A5 YOU 5EE IN THI5 EXAMPLE 



B2 




IN FACT, PEMOtVR£'5 Pl5£OVERY ABOUT THE BINOMIAL 15 A 5PEOAL £A5E OF AN 
EVEN MORS GENERAL RE5ULT, WHIdH HELP5 EXPLAIN WHY THE NORMAL 15 50 
IMPORTANT ANP WIPE5PREAP IN NATURE- IT 15 TH/5-- 


11 Fuzzy 
Central Limit 
Theorem 11 : 

PATA THAT ARE 
INFLUENCEP By MANY 
SMALL ANP UNRSLATSP 
RANPOM SFFSCTS ARE 
APPROXlMATELy NORMALLY 
PISTRIBUTSP. 



MOM Pieo 1 
TK15 l{4CLUDE5> 

E\JERyTHlM6 !» 


-- 

THI5 EXPLAIN5 WHY THE NORMAL 15 SVSRYWHSRS- STOCK MARKET 
FLUCTUATIONS. 5TUPENT WEI6HT5, yEARLy TEMPERATURE AVERA6-E5, 5-A-T. 
SCORES: ALL ARE THE RE5ULT OF MANy PlFFERENT EFFE£T5- FOR EXAMPLE, 

A 5TUPENT5 WEIGHT 15 THE RE5ULT OF SENSTICS. NUTRITION, ILLNE55, ANP 
LA5T NI6HT*5 BEER PARTY. WHEN YOU PUT THEM ALL TOGETHER, YOU 6ET 
THE NORMAL/ (REMEMBER, THE BINOMIAL 15 THE RE5ULT OF n INPEPENPENT 
BERNOULLI TRIAL5J 
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THE Z TRANSFORMATION 


Z =■ 


<r 


CW AN6ES A NORMAL 
RAN POM VARIABLE WITH 
MEAN jj. ANP STANPARP 
PEVIATION cr INTO A 
6TAHDARP NORMAL 
RAN POM VARIABLE WITH 
MEAN 0 ANP STANPARP 
PEVIATION 1. 




M -<r a* /M-r 

~ 1 1 


vt'* another 
SQuiSHiNG, 
GUPlLiG 
. OPERATION.. 



THEN ALL WE NEEP TO FINP PROBABILITIES FOR AMY NORMAL PISTRIBUTION IS 
THE SINGLE TABLE FOR THE 6TAHPARP NORMAL /Car). 


z 

-25 

-24 

-2.3 -2.2 

-2 1 

-2.0 

-1.9 

-18 

-1.7 

-1.6 

F<z) 

0 006 

0 008 0 011 0 014 

0.018 0 023 0 029 

0 036 0 045 0.055 

z 

-1.5 

-1.4 

-1 3 -1.2 

-1.1 

-1.0 

-09 

-0.8 

-0.7 

-0.6 

F(z) 

0 067 0 081 

0 097 0 115 0136 0 159 0.184 0 212 0.242 

0.274 

z 

-0.5 

-0.4 

-03 -0.2 

-0.1 

0.0 

0.1 

0.2 

03 

04 

F(z) 

0 309 0.345 0.382 0.421 

0 460 0.500 0 540 

0.579 0.618 0 655 

z 

0.5 

06 

0.7 0.8 

0.9 

1.0 

1.1 

1.2 

1.3 

1.4 

F(z) 

0 691 

0.726 0.758 0.788 0.816 0.841 

0 864 

0.885 

0 903 

0.919 

z 

1.5 

16 

1.7 1.8 

1.9 

20 

2.1 

2.2 

2.3 

24 

F(z) 

0.933 0.945 0.955 0.964 0 971 

0.977 0 982 

0.986 0 989 

0.992 

z 

25 









F(z) 

0994 












HERE A<«) - Pr(z $ a). THE AREA UNPER THE PENSIT/ (URVE TO THE LEFT 
OF z — a i 


AREA - 

F(oOn 


(WE (AN ALSO 
GRAPH THE 
CV RVE 

THE 

CUMULATIVE 

PROBABILITY. 

rr looks 
LIKE THlW 



O ft. 






THE TABLE ALLOW* l>* TO FlNP THE 
probability of z bein* in any interval 

a 4 z ^ i IT I* JU*T THE PIFFEREN^E 
BETWEEN THE AREA* F (fo ANP F(a). 





O o- 


Pr(a$z^l>) » F(b)-F(ai 


*0, FOR EXAMPLE, 
Pr(-t< z <1) =. f(i)-F(-1) 
- 6413-1*67 
= -602* 


Pr(z* 2 ) = 1 -F( 2 ) 

- = 1 - .9772 

= .0220 


U*IN* THE *UB*TITUTION 
X~u 

Z =■ , WE £AN U*E 

THE *AME TABLE TO FlNP 
PROBABILITIE* FOR OTHER 
NORMAL PI*TRIBUTlON* 


lr 


NOW IT * "JU*T" ALGEBRA 

PvCX > 170^ = 

Pr(X-fJ > 170-150, ) = 

F.(7 > B)’‘ 

Pr (2 > T > 



V30 1*70 '70 


THEN WHAT * THE PROBABILITY OF WEIOHIN* 
MORE THAN I7P POUNP*? 


THAT* l-FO), WHI£H WE £AN REAP FROM THE 
TABLE A* I - 6413 - .1*67 


APEA= .196*7 

/ 


150 '70 

A LITTLE LE** THAN ONE *TUPENT IN *1X TIP* 
THE *£ALE* ABOVE 17P POUNP*. 


THE 6ENERAL RULE FOR £0M PUTIN* NORMAL PROBABILITIE* I* THEREFORE: 

PrCa « X« b) = F(~^)-F( a iF) 















MOW BACK TO pg MOIVRF 
AMP HI* BINOMIAL 
APPROXIMATION... LgT* 
LOOK AT A BINOMIAL 
PI*TRIBUTION WITH n =25 
TRIAL* ANP /?= 5 (25 

coin flip*. *Ay;. wg cam 
coMPirrg (or look up in 
A TABLg; AMy PROBABILITY. 

for gXAMPLg, Pr(x $ m;. 
IT I* .7070 gXACTLY 


MOW CALCULATE A NORMAL RAMPOM VARlABLL X* WITH THg *AMg MgAM 
np - (25X 5) = 12.* AMP *TAMPARP PFVlATIOM cr = np(\-p ) - 2.5. 

Pk(X‘$ 14) - PrCZ« ) 

Z-7 

~ Pr(Z$ .A) 

= .7257 


u5 n 









THAT LITTLE EXTRA 5 WE 
APPEP 15 ^ALLEP THE 

continuity 
correction. 

WE HAVE TO INCLUPE IT 
TO 6ET A 600P 
<X>NTlNUOU5 
APPROXIMATION TO OUR 
pi5<:rete BINOMIAL 
RANPOM VARIABLE X. IT5 
5UMMARIZEP By THI5 ONE 
HIPEOU5 FORMULA. 

^x<b) = Prfete<Z< 

V ^VnpO-p) YnpCl-p) / 


WHEN 15 THI5 APPROXIMATION *600P ENOUGH?* FOR 5TATI5TIOAN5, THE 
RULE OF THUMB 15: WHENEVER n 15 816 ENOU6H TO MAKE THE NUMBER OF 
EXPE6TEP 5UddE55E5 ANP FAILURE5 BOTH 6REATER THAN FlV£; 

np ^ 5 and nO—p') ^ 5 

you £AN 5EE FROM THE5E HI5T06RAM5 THAT THE FIT WHEN p = 0.1 15 
MEPIO^RE OR WOR5E UNTIL REA£HE5 50 , MAKIN6 np * 5. 



1012 2 0 2 4 06 10 


n- 2 , p= o.\ n-io t p*o.i n*5o,p=o.i 
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WHAT'S SO 6REAT ABOUT THIS NORMAL APPROXIMATION? THE BINOMIAL 
PlSTRIBUTION OCCURS COMMON Ly IN NATURE, ANP IT ISN'T HARP TO UNPER 
STANP, BUT IT CAN BE TIRESOME TO CALCULATE- 


-t 


Q 



THERE'S A NEW ONE 
FOR EVERy VALUE 
OF n ANP p... 


THE NORMAL WHICH APPROXIMATES IT MAy BE LESS INTUITIVE, BUT IT’S VERy 
EASy TO USE- THE z TRANSFORM CONVERTS ANY NORMAL TO THE STANDARD 
NORMAL, ALLOWING US TO REAP PROBABILITIES STRAIGHT OUT OF A SIN6LE 
NUMERICAL TABLE. 


/\N N BOOkN^ 
I OR OH ^ 


COMPUTER 


te#/ 



ANP BESIPES, THE NORMAL REALLy IS THE 
MOTHER OF ALL DISTRIBUTIONS' 


H [ aAomm-/- 


THAT'S THE 
FUZZY CENTRAL 
LIMIT THEOREM? 







♦ Chapter 6# 

SAMPLING 


BY NOW. AFTER A STEApy PIET OF COINS. PICE, ANP ABSTRACT 
IPEAS, YOU MAy BE WONPERIN& WHAT ALL THIS STATISTICAL 
EC?Ul PARENT WEVE BEEN BUILPIN6 HAS TO PO WITH THE REAL 
WoRLP. WELL, NOW WE RE FlNALLy COINS’ TO FINP OUT- 



IN THIS CHAPTER, WE BE6IN LOOKING AT THE REAL BUSINESS OF STATISTICS, 
WHICH IS. AFTER ALL, TO SAVE PEOPLE TIME ANP MONEY. PEOPLE HATE TO 
WASTE TIME POIN6 UNNECESSARY WORK, ANP ONE THIN6 STATISTICS CAN PO 
IS TELL US EXACTLy HOW LAZy WE CAN AFFORP TO BE. 
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THE PROBLEM WITH THe WORLP 16 THAT THE COLLECTION* OF *TUFF IN IT 
ARG 60 LARGE. IT* HARP TO GET THe INFORMATION WE WANT: 


VOTING POPULATION*: 
WHAT PERCENTAGE 
FAVOR* EACH CANPIPATE? 


M 






THE INPD*TRIOU*, 
HARP-WORKING, 
*IMPLE-MINPEP 
BEAVERLIKE WAy TO 
AN*WER THE*E 
<?UE*TION* WOULP 
BE TO MEA*URE 
evgRY SINGLE 
PICKLE IN THE 
WORLP (6AY) ANP 
PO *OME 
ARITHMETIC. 


MANUFACTUREP GOOP5 
WHAT PROPORTION WILL 
BE PEFECTtVE? 



PICKLE*: WHAT * THEIR 
AVERAGE LENGTH? 



THE PICKLE-JAR MAKER* 
NEEP TO KNOWI 


as 
_' 



na o 

ss> ,-Z^ «=£> 03 <=? ^ 

s~r <£> & ®= 


BUT WE AREN'T BEAVER*—WERE 
6TAT16TICIAH*! WE RE LOOKING 
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OUR METHOP 15 TO TAKE 

A fAMPLC... a 

RELATIVELY 5MALL 
5UB5ET OF THE TOTAL 
POPULATION, THE WAy 
POLL5TER5 PO AT 
ELECTION TIME 










WOT TO PROLONG THE MYSTERY, THE WAY TO &ET STATISTICALLY C7EPENPABLE 
RESULTS IS TO CHOOSE THE SAMPLE AT random. 








r ^ 

THE SIMPLE RANDOM SAMPLE 


SUPPOSE WE HAVE A LAR6E 
POPULATION OF OBJECTS ANP A 
PROCEPURE FOR SELECTING n OF 
THEM. IF THE PROCEPURE 
ENSURES THAT ALL POfUBLE 
fAMPLER OF Tl OBJECTS ARE 
EQUALLY LIKELY, THEN WE CALL 
THE PROCEPURE A siltipl© 

random sample. 



THE SIMPLE RANPOM SAMPLE HAS TWO PROPERTIES THAT MAKE IT THE 
STANPARP ACAINST WHICH WE MEASURE ALL OTHER METHOPS 


I I unbiasep each unit has the same 

1 9 CHANCE OF BEIN6- CHOSEN. 


2| INPEPENPENCE- SELECTION OF ONE 
^9 UNIT HAS NO INFLUENCE ON THE 
SELECTION OF OTHER UNITS. 


UNFORTUNATELY IN THE REAL WORLP, COM PLETELy UNBIASEP, INPEPENPENT 
SAMPLES ARE HARP TO FINP. FOR INSTANCE, SURVEYING VOTERS By RANPOMLy 
PIALIN6 TELEPHONE NUMBERS IS BIASEP IT IGNORES VOTERS WITHOUT A 
TELEPHONE ANP OVERSAMPLES PEOPLE WITH MORE THAN ONE NUMBER. 
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IT* THEORETICALLY PO**IBLE 
TO 6-ET A RAN POM * AMPLE By 
BUILPIN6 A SAMPLING 
FRAME: A LI*T OF EVERy 
UNIT IN THE POPULATION. By 
U*IN& A RANPOM NUMBER 
GENERATOR, WE £AN Pl^JC n 
OBJECT* AT RANPOM. 




EQUIVALENTLY WE CAN PUT ALL THE 
NAME* ON £ARP* ANP PULL n OF 
THEM OUT OF A PRUM. 




BUT THI* I* NOT ALWAy* EA*y MAKING THE FRAME MAy BE PROHIBITIVELy 
£0*TLy, £ONTROVER*IAL, OR EVEN IMPO**IBL£ FOR EXAMPLE. AN ERA. WATER 
QUALITy *TUPy NEEPEP A *AMPLIN6 FRAME OF LAKE* IN THE U.*., *0 THEN 
*OMEBOPy HA* TO PEOP& 





-7 r 


ARE THERE OTHER WAY* TO *AMPLE THAT ARE MORE EFFICIENT ANP COtT- 
EFFECTWE THAN A *IMPLE RANPOM *AMPLE? VE*'IF yOU ALREApy KNOW 
*OMETHlN6 ABOUT THE POPULATION. FOR IN*TAN£E-. 





c "b ILLS 


Kosher Culls 




PELIGH 


6\A*4t 

Dills 


Stratified 

SAMPLING PIVIPE THE 
POPULATION UNITS INTO 
HOMOGENEOUS GROUPS 
STRATA} ANP PRAW A 
SIMPLE RAN POM SAMPLE 
FROM EACH GROUP. 


uamburgee 


FOR EXAMPLE, THE POPULATION OF ALL PICKLC* CAN BE STRATIFIEP By 
TYPC OF PICKLC. WITHIN EACH TYPE OR STRATUM, THE SIZE SHOULP BE 
LESS VARIABLE. 


Cluster 


SAMPLING GROUPS THE POPULATION INTO SMALL 
CLUSTERS, PRAWS A SIMPLE RANPOM SAMPLE OF 
CLUSTERS, ANP OBSERVES EVERYTHING IN THE SAMPLEP CLUSTERS. THIS CAN BE 
COST-EFFECTIVE IF TRAVEL COSTS BETWEEN RANPOMLY SAMPLEP UNITS IS HIGH. 



AN EXAMPLE IS A CITY 
HOUSING SURVEY WHICH 

piv/pes a crry into 

BLOCKS, RANPOMLY 
SAMPLES THE BLOCKS, 
ANP LOOKS AT EVERY 
HOUSING UNIT IN EACH 
SAMPLEP BLOCK. 


9 ? 









EXCUSe Me.. VJOOLP ion 
Mim-p answering Fiery 
or -SiKTy EPue^noiJ^'? 


CurlA HI m||| SAMPLING- 5TARTS WITH A RAN POM LY 

^yai^muiui chosen unit anp then selects every 

UNIT THEREAFTER FOR INSTANCE, A HIGHWAY TRAFFIC iTUPY MIGHT CHECK 
EVERY HUNPREPTH CAR AT A TOLL BOOTH THIS PLAN 16 EASY TO IMPLEMENT 
ANP CAN BE MORE EFFICIENT IF TRAFFIC PATTERN* VARY SMOOTHLY OVER TIME. 


Word of warning #1: 

MOST STATISTICAL METHOPS PEPENP ON 
THE INPEPENPENCE ANP LACK OF BIAS OF 
THE SIMPLE RANPOM SAMPLE THE RESULTS 
AHEAP APPLY TO THE SIMPLE RANPOM 
SAMPLE OHLY. FOR OTHER SAMPLING 
PROCEPURES, THE RESULTS MUST BE 
MOPIFIEP THE RETAILS APPEAR IN 
SPECIALIZEP SAMPLING TEXTBOOKS ANP 
COMPUTER ALGORITHMS. 


<5A 







Word of warning #2 


WITHOUT RANPOMIZEP 
PE5/6W. THERE CAN BE NO 
PEPENPABLE 5TATI5TICAL 
ANALY5I5. NO MATTER 
HOW IT 15 MOPIFIEP THE 
BEAUTY OF RAN POM 
5AMPLIN6- 15 THAT |T 
*5TATl5TlCALLY 
6UARANTEE5" THE 
ACCURACY OF THE 5URVEY 


A COMMONLY U5EP METHOP 15 E5PECIALLY PRONE TO BIA5 IT'5 CALLEP AN 
opportunity 5AM PLE AVOIPIN6- ALL 

THE BOTHER OF PE5/6-NIN6 A * ---v. 

PROCEPURE, THE OPPORTUNITY ( W ORfiV/ 



A CLA55IC EXAMPLE 15 5HERE HrTE‘5 BOOK", WOMCH AMP LOVC WO.POP 
QUE5T/ONNAIRE5 WENT TO WOMEN'5 OR6ANIZATION5 CAN OPPORTUMITY 
4AMPLG, ONLY 4.9% WERE FILLEP OUT ANP RETURNEP CRCfPOMfC BIAS). 

50 HER "RE5ULT5' WERE BA5EP ON A 5AMPLE OF WOMEN WHO WERE HIGHLY 
MOTIVATEP TO AN5WER THE 5URVET5 QUE5TION5, FOR WHATEVER REA50N. 



f AT LA5T, A 
SCIENTIFIC WAY 
TO HUMILIATE 
. ARNOLPI ✓ 



SAMPLE SIZE -4 

& standard error \ 


►JOW LET’5 6ET POWN TO 
B£455 TACK*... REAL BRA55 
TAZX5, THAT 15. 5UPP05E THE 
BERNOULLI TACK FACTORY 15 
£HURNIN6. OUT BRA55 TA£K5, 
50ME OF WHKH, INEVITABLY, 
ARE PEFECTIVE 






% 



THE A5TUTE REAPER WILL RE^Ofi-NIZE THI5 A5 A BERNOULLI *y*TEAN EA£H 
NEW TAZK 15 THE OUTCOME OF A BERNOULLI TRIAL WITH 50ME PROBABILITY p 
OF 5U£CE55 ei E, BEIN6 PEFE^T-FREE^ ANP PROBABILITY OF FAILURE 
(I E., BEIN6- PEFECTIVE; 



WE THINK OF THI5 5ITUATION A5 IF THERE WERE A HIPPEN BUT REAL 
“BERNOULLI MACHINE" WH05E PROBABILITY /? 60VERN5 THE OUT(OME5 WE 
OB5ERVE IN THE 50-(ALLEP ‘REAL WORLP* 
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SlNZE THE BERNOULLI 
MACHINE 15 INVISIBLE, WE 
PONT KNOW WHAT p IS. 
BUT WE'P LIKE TO FlNP 
OUT SO WE TAKE A 
RAHPOM 6AMPLC OF n 
TAZKS, ANP FlNP THAT 
* OF THEM ARE O K. 



' U/AM... FEELS 
LIKE n.»<qoo 
AWp % s.^2.. 


NOW THE PROPORTION OF SUCCESSES IN THE SAMPLE SHOULP BE SOMEWHERE 
AROUNP p... SO WE £ALL IT PRONOUN^EP ’P-HAT." 

P 

r n 

p IS THE NUMBER OF SU^ESSES * IN THE SAMPLE, PIVIPEP By THE 
SAMPLE SIZE n FOR EXAMPLE, IF /? WAS 0S, ANP WE SAMPLEP n • WOO 
TAZKS, MAyBE WE FOUNP *-032 600P ONES, MAKING p • .032. 


WE ASK: HOW 600P 
IS THIS ESTIMATE? 



IH 




ANP WE ANSWER WITH 
ANOTHER QUESTION. WHAT 
POES THE FIRST 
QUESTION MEAN? 







WE £AN'T KNOW TME PRE£I5E PlFFERENCE BETWEEN p ANP p, BECAUSE WE 
PONT KNOW TME VALUE OF p TME R£AL QUESTION 15 THI5: IF WE TOOK XWWK 
$AAPL£* OF 1000 TACK 5 ANP OB5ERVEP ^ FOR EA£M SAMPLE, MOW WOULP 
TM05E VALUED OF p BE PI5TRIBUT£P AROUNP p? 



IN FACT. TME5E p VALUED ARE LOOKING MORE ANP MORE LIKE A RANPOM 
VARIABLE TME SELECTION OF TME /7-UNIT 5AM PLE 15 A RANPOM EXPERIMENT, 
ANP TME OB5ERVATION p 15 A NUMERICAL OUTCOME! 



TO BE PRE65E, IF X 15 
TME NUMBER OF 
5UOE55E5 IN TME 5AMPLE, 
TMEN X 15 NOTMIN6 BUT 
OUR OLP FRIENP TME 
BINOMIAL RANPOM 
VARIABLE (/? TRIAL5, 
PROBABILITY p)„ ANP WE 
PEFINE TME OB4£RV£P 
PROPORTION TO BE TME 
RANPOM VARIABLE 

>-4 


1516 P TM6. 

RANOOfA VARIABLE, 

LITTLE , \T* 

^Alue for A particular 
v £AMPl£! ^ . 



IAA 







KNOWING ALL ABOUT X, WE QUICKLY CONCLUDE A FEW FAOT5 ABOUT P- 


V THE MEAN OF P 15 E[ P] « p 
2) THE 5TANPARP PEVIATION OF P 15 

. ipo^p) 

VrT 


3 ) FOR LAR&E /?. P 15 
APPROXlMATELy NORMAL- 


V 



f )wnT\ 

te u 
Mb'/ 


ANP THERE YOU HAVE IT ALL! THE OB5ERVEP VALUE5 OF P WILL BE £ENT£REP 
ON /? (NOT 5URPR/5IN6-LyX ANP THEIR 5TANPARP PEVIATION, OR 5PREAP, 15 
PROPORTIONAL TO THAT MA6I C NUMBER WE MENTIONEP AT THE BE6-INNIN& OF 
THE CHAPTER: 



ANP, 5lN££ P 15 NEARLy NORMAL, WE (AN U5E OUR RULE OF THUMB TO 
CONCLUPG THAT APPROXIMATELy 60 % OF ALL E5TIMATE5 WILL FALL WITHIN ONE 
5TANPARP PEVIATION OF THE TRUE VALUE p 




601N6 Wl* TO THE TACK5, 
WfTH n - 100O ANP p * .05. 
WE 6£T A STANPARP 
PEVIATION OF 


•fW* 


1^)6121 

iooo 


LOOK*. A \ 
BIT LIKE \ 
ONt Of THOSE 
L TACfc* .. V 


» .0113 


60 WE EXPERT ABOUT 60% 
OF OUR ESTIMATE* TO FALL 
IN THE NARROW INTERVAL 

.0307 $ p £ .0613 
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MOST OF STATISTIC* INVOLVES THE 4-STEP PROCESS WEVE JUST WALKEP 
THROUGH: 



PEFINE POPULATION WITH UNKNOWN 
PARAMETER 


FINP AN ESTIMATOR, ITS THEORETICAL 
SAMPLING PfSTRIBUTION ANP 
STANPARP PEVIATION 



ACTUALLY PRAW A RANPOM SAMPLE 
ANP FINP THE ESTIMATE. 




REPORT THE RESULT ANP ITS 
STATISTICAL OR SAMPLING ERROR. 


\NE HAve -p = .8^ 
\N\TH A SAMPLING 
FPRC* OF (•(%. 
MR BERNOULLI 


joSr 
_ OHC 
I Q^SHcaJ: 
MMW) Hl^EPi 
'100' 



Sampling Distribution 

of the MEAN 



r 

JAR MANUFACTURERS WOULP LIKE TO KNOW THE AVERAGE LEH6TH Of A 
PICKLE WITHOUT EXAMINING EVERY CUCUMBER IN CALIFORNIA- THEY RANPOMLY 
SELECT n PICKLES ANP MEASURE THEIR LENGTHS X v X v X n 

BY NOW YOU MAY BE 
USEP TO THE IPEA 
THAT EACH X i IS A 
RAHPOM VARIABLE: 

THE NUMERICAL 
OUTCOME OF A 
RANPOM EXPERIMENT. 

s_ V 



IF p IS THE (UNKNOWN) 
MEAN PICKLG LENGTH. ANP 
cr |S THE STANPARP 
PEVlATlON OF THE FICKLE 
LEUETU PliTRIBUTlON. 
THEN 

e[*«] c M 

cr(X,) *■ a 

FOR EVERY i (BECAUSE X, 
COULP HAVE BEEN THE 
LENGTH OF ANY PICKLE). 


to 


4TRANOE, WOW 
much vie know ^boot 
RWOfA VARIABLE 

we pipm't even Know 
V/KE VARIABLES, 

A MINUTE A60- 


.v. - 




14 THERE MWTU1H6 
thkt fSN T A 
RfcUDOA* VARIABLE ? 


r- - 

WOW WE LOOK AT THE 5AM PLE 
MEAN: THE AVERAGE LENGTH OF 
THE 5EL£CTgp PI£KLg5 IT5 A 
NEW RAN POM VARIABLE 6-IVEN 
B^ 




A5 BEFORE, WE P LIKE TO KNOW *HOW CLOSE* THI5 15 TO p, MEANING. IF 
THI5 5AMPUN6- WERE PONE MANy TIME5, WHAT5 THE PI5TRIBUTION OF X ? 
BKAU5E WE KNOW ABOUT X,, X 2 , ... ANP X„. WE AL50 KNOW THAT 



BUT WE PONT KNOW THE 5HAPE OF X'S PI5TRIBUTION. THE 5AMPLE 
PROBABILITY DISTRIBUTION p WA5 ALM05T NORMAL. BECAU5E IT WA5 BA5EP 
ON A BINOMIAL RANPOM VARIABLE BUT WHAT ABOUT X, THE SAMPLE MEAN 
E5TIMAT OR??? 

V/ o V 












rr TURNS OUT THAT X IS ALSO APPROXIMATELY NORMAL/ THIS FAMOUS 
RESULT IS CALLEP THE 

CENTRAL LIMIT 
THEOREM 

IT SAYS: IF ONE TAXES RANPOM SAMPLES 
OF SIZE n FROM A POPULATION OF MEAN 
M ANP STANPARP PEVlATfON a, THEN, AS 
n SETS LAR6E, X APPROACHES THE 
NORMAL PISTRIBUTION WITH MEAN jj. 

ANP STANPARP PEVIATION ^ . THEN 



Pr(a $ X $ />) 


' CT /-/K / 




WHAT IS REMARKABLE ABOUT THIS? IT SAYS THAT RESARPLESS OF THE SHAPE 
OF THE ORIGINAL PISTRIBUTION ClN THIS CASE, OF PICKLE LENSTHSA THE 
TAKINS OF A/ERASES RESULTS IN A NORMAL. TO FINP THE PISTRIBUTION OF 
X, WE NEEP KNOW ONLY THE POPULATION MEAN ANP STANPARP PEVIATION. 



THE THREE PROBABILITY PENSITIES ABOVE ALL HAVE THE SAME MEAN ANP 
STANPARP PEVIATION. PESPITE THEIR PlFFERENT SHAPES. WHEN n^O, THE 
SAMPLING PISTRIBUTIONS OF THE MEAN, X, ARE NEARLY IPENTICAL. 





The t-distribution 

AMAZIN6. AS THE CENTRAL LIMIT THEOREM 1$, IT HAS AT LEAST TWO PROBLEMS 



BUT SAMPLE SIZES ARE OFTEN 
SMALL. ANP cr IS USUALLy 
UNKNOWN. CERTAINLY IN THE CASE 
OF THE PICKLES. WE HAVE NO IPGA 
HOW WIPELy THEIR LENGTHS VARy 
AROUNP THE AVERAGE. 


ONE- IT PEPENPS ON A LAR6E 
SAMPLE SIZE 

two to use nr. we neep to 

KNOW cr, THE STANPARP 
PEVlATlON. 



WHAT WE CAN PO IN THIS CASE IS TO &TIMATC cr W TAKING THE fTANPARP 
PCVIATIOH OF THC fAMFLC, WHICH. yOU'LL RECALL, IS &IVEN By THE FORMULA 


s — nbj^(Zi-x') 1 

i-t 

THEN, IN PLACE OF THE RANPOM 
VARIABLE 



WE SUBSTITUTE S FOR cr, 
ANP PEFINE A N£W RANPOM 
VARIABLE t By 




. . rs ./vi \LV) 




ShoO^-D 

Know 


Most 


PoN'T Opr — 
MEN? of 'foup- 

THINK 

6009 T^OUMITS.. 


.-*•1 


T07 






r G£>sstT,V 0 (J iaaPly cm 
THAT ouR PRoPutr ^ 

V^RiE^ ifx EXCELLENCE? J 

^rTL^WfPSEUpc^yMaE 


fGOSSET WAS EMPLOygP W THE 6U!NNC*$ 
BRGWGRY, WHICH REQUIRE? HIM TO USE A 
PSEUPONYM, FOR 'SOME REASON.) 


making the assumption that the 

OR 161 HAL POPULATION PI $TRl POTION 
WA6 NORMAL, OR NEARLY NORMAL, 
•STUPENf WAS ABLE TO CONCLUPE 


r I HE 4TlifF 6ETE> 

You ppawx, 


<$&•* 


t IS MORE SPREAP OUT THAN Z. ITS 
■FLATTER' THAN NORMAL THIS IS 
BECAUSE THE USE OF S (NTROPUCES 
MORE UNCERTAINTy, MAKING, t 
■SLOPPIER’ THAN Z 


z Pi*r 

v/ 

\ t D(VT 


YOU CAM THINK OF THE RANPOM VARIABLE t AS TNG PCST WC CAN PO ONPGR 
TNG CIRCOMiTAHCC*. ITS PlSTRIBUTION IS CALLEP BTOPGNT'i t BECAUSE ITS 
INVENTOR. WILLIAM 60&CT. PUBLlSHEP UNPER THE PSEUPONyM 'STUPENT' 


THE AMOUNT OF SPREAP PEPENPS ON 
THE iAMPLG SIZE THE GREATER THE 
SAMPLE SIZE, THE MORE CONFlPENT WE 
CAN BE THAT S IS NEAR <r. ANP THE 
CLOSER t GETS TO z. THE NORMAL 


WORM A L 


LARG« 
SAMPLE t 


Smaller 
SxMfLfc t 


GOSSET WAS ABLE TO COMPUTE 
TABLES OF t FOR VARIOUS SAMPLE 
SIZES. WHICH WE WILL SEE HOW TO 
USE IN THE FOLLOWING CHAPTER 


W THE 

\ 

HJ6TTHIUK ^ 
>F VJWW YOU\iC 
M.R£fct>y f' 
warmed? y 


fo8 








IN TH/5 CHAPTER. WE £ON5lPEREP A CENTRAL PROBLEM OF RCAL-WORLP 
STATISTIC HOW TO 5ELEZT A SAMPLE FROM A LAR6-E POPULATION 50 THAT 
5TATI5T|£AL ANALy5l5 £AN BE VALIP BE5IPE5 THE *60LP 5TANPARP* OF THE 
5/MPLE RANPOM 5AMPLE, WE AL50 PE5£RIBEP 50ME OTHER 5AMPLIN& 5CHEME5 
THAT ARE U5EP IN THE INTERE5T5 OF EFFICIENT. COST, ANP PRACTICALITY. 


ON A SCALE Of 
\ TO 5, Poyj Po 
io\i fKt A Boor 
K&EPiNfr PEOPLE 
WAiT/mO'? 



jyLSS*; 



WE FOUNP THAT 5AMPLE 
PROPORTIONS p WERE 
APPROX/MATELy NORMALLy 
PI5TRIBUTEP. WHILE THE 
PI5TRIBUT/ON OF THE 
5AMPLE MEAN X PEPENPEP 
ON THE 5AMPLE 5lZE FOR 
LAR6-E 5AMPLE5, THE 
PI5TRIBUTION WA5 
APPROXlMATELy NORMAL, 
WHILE FOR 5MALL 5AMPLE5, 
WE U5E THE 5TUPENT5 t 
PI5TRIBUTION. 
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IN THE NEXT TWO CHAPTERS, WE LOOK 
AT HOW TO USE THESE PISTRI6UTIONS TO 
/MAKE *TATI*TICM. INFCRCNtC*- 6-IVEN A 
SIN6-LE OBSERVATION. LIKE A POLITICAL 
POLL. HOW PO WE USE OUR KNOWLEP^E 
OF p ANP X TO EVALUATE IT? 
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♦ Chapter 7 ♦ 

CONFIDENCE 

INTERVALS 




ill 



IN THIS CHAPTER. WE PO THE REVERSE 6/VEN OHC SAMPLE. WE ASK THE 
QUESTION, WHAT WAS THE RAN POM SYSTEM THAT &ENERATEP ITS STATISTICS? 


IN THE LAST CHAPTER WE 
LOOKEP AT SAMPLING 
STARTING WITH A LAR6E 
POPULATION, WE IMA6INEP 
TAKINO MANY SAMPLES. ANP 
WE PEPUCEP HOW SOME 
SAMPLE ESTIMATORS WERE 
PlSTRIBUTEP. 


Ill 






THIS SHIFT REPRESENTS A GHANGE 
IN OUR MOPE OF THINKING— FROM 
PEPUGTIVE REASONING TO 
IHPUCTIOH. 


\t'S LIKE A C&iaA\i4AL' 

im\/&^ti6atiok, Watson' 


IN PCPUmVC RZA40MH6, WE REASON IHPUCTWC RCA6ONIH6, BY 
FROM A HYPOTHESIS TO A GONGLUSION; CONTRAST, ARGUES BAtKWARP 
*IF LORP FASTBAGK GOMMITTEP MURPER. FROM A SET OF OBSERVATIONS 

THEN HE WOULP WIPE THE FINGER- TO A REASONABLE HYPOTHESIS; 

PRINTS OFF THE GUN ’ 






ESTIMATING 

CONFIDENCE INTERVALS 


IS ONE OF THE MOST 
EFFECTIVE FORMS OF 
STATISTICAL INFERENCE, 
ANP ONE yoU SEE EVERy 

pAy before electioh 

TIME... 


M V 


/ HOLP MV 
V^At, WATSOVT 
v'M 60 /M 6 WTO 
THE BDLLIM6 , 
V BU^iNE^S'/ / 



IN A RECENT ELECTION SOMEWHERE, INCUMBENT SENATOR ASTUTE (ACCENT 
ON THE LAST SyLLABLE. PLEASE/; COMMlSSlONEP A POLL By BETTER HOLME* 
RESEARCH. POLLSTER HOLMES PRAWS A SIMPLE RANPOM SAMPLE OF WOO 
VOTERS ANP ASKS THEM WHAT THEY THINK OF ASTUTE- 


A) HE’S (SOP’S 6IFT 
TO HUMANITY 
8; HE'S THE PElTY S 
SPECIAL BLESSING 
ON MOST OF 
. HUMANITY 


AFTER CENSORING THE REMARKS OF A FEW 6RUMPY OUTLIERS. HOLMES FlNPS 
THAT 990 VOTERS FAVOR HIS CLIENT, SENATOR ASTUTE- 






^ you owuy a*kep a 
THOU*ANP PEOPLE?/ BUT 
THERE ARE A MILLION 
VOTER* IN THI* *TAT &n/ 



99% CCRTAIHTW 
THUMPER/( WHAT PO yOU 
*UPPO*E WOULP HAPPEN 
IF I RAN ON A PLATFORM 
OF 99%, HOHCiTYT? , 



I PONT KNOW 
IT'* NEVER BEEN 
v '—, TRlEP... r-' 



AFTER A*TUT£ £ALM* 
POWN, HOLME* EXPLAIN* 
WHAT HE MEAN* By 95% 
COHFIQCUCG- HE KNOW* 
THAT HI* E*T/MAT/ON 
PROCEPURE HA* A 99% 
PROBABILITY Of 
PROPUON6. AN INTERVAL 
03NTA/NIN6- p, I E-. IN HI* 
MANy YEAR* OF POLLING, 
p HA* FALLEN WITHIN THE 
£ONF/P£N£E INTERVAL 
AROUNP THE OB*ERVEP 
VALUE, p, 99% OF THE 
TIME- 









yT SHOOT! ^ 
' AHyTHIH6 TO 
TAKE My MlHP 
OFF THEM PAH& 
STATIST! CSI j 


KNOWING THE ARCHER S SKILL LEVEL. 
THE PETE£TIVE PRAWS A CIRCLE WITH 
IP CM R API US AROUHP THE ARROW 
HE HOW HAS 99% COHFIPCHCC THAT 
HIS CIRCLE IHCLUPES THE CEHTER OF 
THE BULL'S-EyE/ 


SEHATOR ASTUTE IS STlLL 
COHFUSEP/ SO HOLMES 6IVES 

him ah archery lesson. 





HE REASOHEP THAT IF HE PREW W CM RAPlUS CIRCLES 
AROUHP MAHY ARROW*, HIS CIRCLES WOULP INCLUPE 
THE CEHTER 95% OF THE TIME 


TRU& 

cenrep. 


X--ARROW 


CPROBABI LISTS 
USE THE TERM 
STOCHASTIC 
TO PESCRIBE 
RAH POM 
MOPELS IT'S 
PERIVEP FROM 
THE 6REEK 
STOCHAZCS- 
THAI, MEAHIH6 
TO AIM AT A 
TAR6ET. OR 
6UESS. FROM 
STOCHOS. A 
TAR6-ET.; 
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HOLMES HOW TRANSLATES 
THE ARCHERY LESSON INTO 
THE LAN6UA6E WE 
PEVELOPEP LAST CHAPTER 


Step Ones SHOOT A LOT OF ARROWS 


A PROBABILITY CALCULATION FlNPS 
THE WIPTH OF THE "BULL'S-EYE * 

THE ESTIMATES p ARE OUR ARROWS. 
WE SAW THAT THE SAMPLING 
PISTRIBUTJON OF p IS NEARLY 
NORMAL WITH MEAN p ANP 
STANPARP PEVIATION 


O’(p') =■ 


V pO-p') 

VrT 



SINCE THE CURVE IS NORMAL, WE USE THE Z-TRANSFORM ANP A STANPARP 
TABLE TO FlNP THE WIPTH OF THE INTERVAL WITHIN WHICH 95% OF THE 
"ARROWS" HIT. (WELL SEE EXACTLY HOW TO PO THIS IN A FEW PA6-ES ) WE 
FlNP THIS WIPTH TO BE 1.96 STANPARP PEVIATIONS- 

.99 = P r (-1.96 < Z < 1.9S; 



95% of 
ARROWS 

UNP yjiTHiH 

^ THIS 

imterval 




NOW WE PO 50ME ALGEBRA. BY 
pefinition of the z-tran5Form, 

95 = Pr (-l.9fc < $~r < I 9fc) 

CTp") " 

WHICH BECOME5 

.9 1 ? Pr ( -p-1 9kc(p) < P < p+l-9bo(pi) 

WHICH 15 JU5T ANOTHER WAy OF 5AyiN& THAT 95% OF THE p *ARROW5" LAMP 
BETWEEN p - \96 cr(p') ANP p + 19A<r(p). 


MOW WE RE IN A POSITION TO VIEW THE TARGET FROM BEHINPI ONE MORE 
TURN OF THE AL6-EBRA URANIC MAKE5 IT 

.99*- Pr (p- I 9&<r(p) ^ -p * V + 1-9^^) 



HERE WE ARE PRAWIN6 
CIRCLE5 AROUNP A LOT 
OF ARROW5 Cl E., 
MAKIN6 INTERVALS 
AROUNP p) ANP 
5AyiN6- THAT 95% OF 
THEM COVER p. 


BUT THERE 15 ONE TINy PROBLEM... WC PON’T ACTUALLY MOW THC 6UC 
OF TUB BULL'*- CYC, BECAU5E WE PONT KNOW p, ANP THE WIPTH 15 A 
MULTIPLE OF cr(p'). 



/the CIRCLES 

*=> Ml PiFFERE^t 1 
JiOyJ, 6u-v 
it'-5 oKAY, 

PfcALuy.. 


50 WE FUP6E A LITTLE ANP U5E 
THE fTAHPARP CRROR OF P ■ 


V>T 

IN IT5 PLACE IT5 CL05E 
ENOU6-H... IT’5 THE BE5T WE 
CAN PO... ANP IT CAN EVEN BE 
THEORETlCALLy JU5TIFIEP' 




tie 





NOW THE FORMULA IS 

.95 - 9b$£(f) < p < ^ +i.9t^f)) 


A6-AJN, THIS EQUATION PESCRIBES THE 
PROBABILITY THAT THE TRUE. FlXEP 
POPULATION PROPORTION FALLS 
WITHIN THE RANPOM INTERVAL 

Cp - 1.96*C(p), p + 1.9*SE(p)>. 

IF WE SAMPLE? REPEATEPLY. THESE 
INTERVALS WOULP COVER p 95% OF 
THE TIME. 

NOW OUR PROBABILITY CALCULATION IS 



PONE, ANP ITS TIME FOR- 


Step Two: 


THE PETECTIVE WORK IN A REAL POLL, 
HOLMES TAKES JUST ONE SIMPLE 
RANPOM SAMPLE OF WOO VOTES, FINPS 
p * 55P, ANP WANTS TO INFER p. 


HE MAKES USE OF STEP ONE TO 
COMPUTE 




( 55 ) (- 45 ) 


ooo 


•PI 57 



HE CONCLUPES THAT WE CAN HAVE 
95% CONFlPENCE THAT p IS WITHIN 
THE RAN&E 

p * I 9^> 

~.55o± 0-94)6«57) 

= .5?0± .031 


THIS IS WHAT POLLS MEAN 
WHEN THEY REFER TO THEIR 
‘MARGIN OF ERROR ■ IN THIS 
CASE, HOLMES FOUNP THAT 

- 519 $ p $. 561 , 

IN OTHER WORPS THAT 
p * 55% WITH A 5% MAR6IN OF 
ERROR. (POLLS TYPICALLY USE A 
95% CONFlPENCE LEVEL) 




rue margin of 

BRROR tvAS ?•/*, 

what eve ft that 
MEAN*.. 






THIS PA£E SHOW5 THE RESULTS OF A COMPUTER SIMULATION OF TWENTY 
SAMPLES OF SIZE /? * Woo. WE ASSUME? THAT THE TRUE VALUE OF /? * 5 AT 
THE TOP YOU SEE THE SAMPLING PlSTRIBUTION OF p ('NORMAL, WITH MEAN p 

ANP BELOW ARE THE 95% CONFlPENCE INTERVALS FROM EACH 

SAMPLE ON AVERAGE, ONE OUT OF TWENTY COR 5%) OF THESE INTERVALS WILL 
NOT COVER THE POINT p * 5- 



0 44 0 46 0 48 0 50 0 52 0 54 0 56 

95% Confidence Intervals for p 


1 20 




ALTHOUGH 95% 
CONFIDENCE 15 
&OOV ENOU6H FOR 
NEW5PAPER POLL5, 
CT I5N’T 600D 
ENOU6-H FOR 
SENATOR ASTUTE 
HE WANT5 99%' 




THE FIR5T METHOD 15 EQUIVALENT TO WIPENIN6 THE CONFWEN6E INTERVAL 
THE GREATER THE MARGIN OF ERROR, THE MORE CERTAIN yOU ARE THE TRUE 
VALUE OF p LIE5 IN THE INTERVAL 



121 


mawe rr'5 time to see EXAcny 
HOW WE FIND THE ENDS OF 
THESE CONFIDENCE INTERVALS.. 








THE RELEVANT NUMBER 
HERE WE USUALLY CALL a. 
IT MEASURES THE 
DIFFERENCE BETWEEN THE 
DESIRED CONFIDENCE 
LEVEL AND CERTAINTY FOR 
EXAMPLE, WHEN THE 
CONFIDENCE LEVEL IS 95%, 
OR 0.95, a IS 05 SO WE 
SPEAK OF THE (i~a) \OOf, 
CONFIDENCE LEVEL 


FINDING THE 0 -Ct) 100% CONFIDENCE 
INTERVAL MEANS: LOOK AT A STANDARD 
NORMAL CURVE, AND FIND THE POINTS ±1 
BETWEEN WHICH THE AREA IS 1-a 




WE CAN FIND Z % STRAIGHT 

FROM THE STANDARD NORMAL 
TABLE CPAC-E 64). IT'S THE 
POINT WITH THE PROPERTY 

?rU > z a/ ) ~ f 

IN PARTICULAR, 

PrCz 5 * 2 : = .02? 


z 

-2.5 

-2.4 

-2.3 

-2.2 

-2.1 

F(z) 

0.006 

0.008 

0 011 

0.014 

0.018 

z 

-2.0 , 

.*-1.9 

-1.8 

-1.7 

-1.6 

F(z) 

0.023 , 

^029 

0.036 

0 045 

0.055 

z 

F(z) 

-1.5 

0.067 



, 





/ 


\b IN 



fJ 

rw 


THIS , 
interva 1 -! 








HERE’S A LITTLE TABLE OF THE CRITICAL 
VALUES FOR VARIOUS LEVELS OF 
CONFIPGHCE- 


i-a .69 

99 

.95 

99 

a .29 

TO 

.95 

.91 

Or/2 .79 

.95 

.925 

.995 

1.20 

1.64 

1.96 

2.50 


For thi^ level op 

£ohF>PEMCE, 6o OUT 

This MAH'/ -STAKDAPP 

C*j£t The 

AH50i£p 


TO MAKE A 99% EONFlPENEE INTERVAL, WE USE THAT TABLE TO WRITE 
99 = Pr(p - 2 . 56 S^ + 250 S 
WHI^H WE SLOPPlLy ABBREVIATE AS 


= -W± 

= 55*.o4i < 

viith 99% cohiFit^:^. < 


/ 6REAT 1 \ 
61 ILL 

OVER *70%* 


*559 . 5 / 


WIDENING THE INTERVAL 15 ONE WAV' TO INCREASE OUR CONFIDENCE IN THE 
RESULT. AS WE MENTIONED. ANOTHER WAy WOULD BE TO SHOOT OUR ARROWS 
MORE ACCURATELY. IF WE KNEW THAT THE ARCHER GOT 95* OF HER ARROWS 
WITHIN 1 Of THE BULLS 'EYE, OUR ESTIMATES COULD BE A LOT SHARPER/ 



HOW DO WE DO THIS? BY INCREASING THE SAMPLE SIZE/ THE WIDTH OF THE 
CONFIDENCE INTERVAL DEPENDS ON THE SAMPLE SIZE- THE INTERVAL HAS THE 
FORM p + E. WHERE E. THE ERROR. IS GIVEN BY 


X 

SO THE BIGGER WE 
MAKE THE SMALLER 
THE ERROR- (EG., 
QUADRUPLING /7 HALVES 
THE INTERVAL WIDTH-} 



f> 


ASTUTE ASKS HOLMES TO GIVE HIM A 
SMALL ERROR WITH HIGH CONFIDENCE-SAY 
99% CONFIDENCE WITH E = ± .01. HOLMES 
SOLVES FOR Zl. 



(WHERE IS A GUESS AT THE TRUE 
PROPORTION ^-REMEMBER, WE 
HAVENT TAKEN THE SAMPLE YET/} 


1ZS 




TAKING A CONSERVATIVE GUESS 
OF p* = .5, HOLMES FINDS 

yi = 

(oo l 

• OCOI 

= 16,641 

IPOD VOTERS GAVE A 3% 

ERROR WfTH 95% CON FIPENCE. 
TO GET A 1% ERROR WITH 99% 
CONFIDENCE, HOLMES HAS TO 
SAMPLE 1C.C41 VOTERS/ 


INTERVIEW— | \ * £A * a F, ^ C 
PAYABLE IN ) IT OUT.. ^ 

APVANCE- / [/"-— 



ON THE OTHER 
HANP. WHO CAN 
PLACE A VALUE ON 
PEACE OF MIND 7 



SO THEY DO THE POLL 
ANP GO INTO THE 
ELECTION WITH 
COHFIDCHCG. 


BUT- ALL THIS PROBABILITY STUFF IS ONLY COOP BGFORG AN ELECTION. 
AFTGR THE ELECTION, THE SENATOR IS EfTHER 1PD% /A/ OR WO% OUT! ANP 
DESPITE EVERYTHING. SENATOR ASTUTE LOSES THE ELECTION. . 





WUAT '—> 
VIAPPFMep? 








WHAT HAPPENED IS THAT POLITICIANS ARE NOT ELECTEP By POLLS' 



W'X RESPONSE BIAS: 

▼ » \ VOTERS MAy LIE TO 
I THE INTERVIEWER 

OR CHANOE THEIR MfNP5 

before election pay 

—---- 

I LOVE BOTH MAJOR 
PARTIES AMD ONLY WSF 
I COULP VOTE FOR BOTH 
OF THEM! 

You tffEDUlct* 



CTX ALTHOUGH THE 
t J POLL 15 AN 

UNBlASEP 5AMPLE 
| OF POTENTIAL VOTERS, 
THE VOTING BOOTH 
COUNTS ONLy ACTUAL 
I VOTERS. 


W S NON RESPONSE 
1 O' BIAS: the VOTER 
rXi MAY NOT BE HOME 
OR REFUSE TO TAKE 
PART IN THE POLL 


PUT WASN'T THE 
ELECTION „ 
SesreRpAV 7 

M> 7? 





THERE IS WO WAV’ FOR A 
POLLSTER TO 6-ET INSIPE 
A POTENTIAL VOTER’S 
HEAP ANP KNOW IF SHE'S 
GO IN6 TO VOTE, IF SHE'S 
LY/NS, OR IF SHE'S GOING 
TO CHAN6E HER MINP 
BEFORE ELECTION PAY 
LAR6E SAMPLE SIZES 
CANNOT REPUCE THESE 
KINPS OF ERRORS 









VJHA.T ABOUT 
THEBE' 7 /—' 


OUTRIDE Of ^ 
TEXAS AMD 
, CHICAGO, \ thiMK 
^WERe SAFE .. 


SINCE THESE ERRORS CAN BE 
LARGE, rr SELPOM PAyS TO TAKE 
A VERY LARGE RAN POM SAMPLE 


INSTEAD W£ 

USE THfe 

ls$jWzer?' 


IN TME LAST FIVE PRESlPENTlAL ELECTIONS. THE GALLUP POLL HAS INTER- 
VlEWEP FEWER THAN 4,OOP VOTERS FOR EACH ELECTION. y£T IN ALL FIVE 
ELECTIONS, THE GALLUP ORGANIZATION S ERRORS IN PREPlCTlNG THE 
PRESlPENTlAL ELECTION OUTCOME HAVE BEEN LESS THAN Z% 


'MP\/ INDUSTRIAL 

VJ HAT'S ( stpehgth 

s^TH AT ?/ 


THEIR SUCCESS IS PUE TO THEIR USE OF ESTIMATORS THAT ACCOUNT FOR 
NON-RESPONSE, ANP THEy SCREEN OUT ELIGIBLE VOTERS WHO ARE NOT 
LlKELy TO VOTE 


TO SUMMARIZE, ESTlMATEP 
PROPORTION = TRUE PROPORTION + 
BIAS + RANPOM SAMPLING ERROR. 
EVEN POLLSTERS HAVE LIMlTEP 
FUNPS THEy WlSELy CHOOSE TO 
SPENP THEIR MONEy REPU&H6 
BIAS, RATHER THAN INCREASING THE 
SAMPLES BEyONP 4,000 VOTERS 
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Confidence Intervals 
for |a 


UP TO NOW. WE’VE BEEN 
LOOKING. AT CONFlPENCE 
INTERVALS FOR A PROPOR¬ 
TION p OF A POPULATION. 
EXACTLY THE SAME 
REASONING WORKS FOR 
THE POPULATION MEAN jx 



IN THE LAST CHAPTER (? 105), WE SAW THAT THE PISTRIBUTION OF SAMPLE 
MEANS X IS APPROXIMATELY NORMAL, CENTERED ON THE ACTUAL POPULATION 


MEAN jx, WITH STANPARP PEVIATION %f, WHERE <r IS THE POPULATION 
STANPARP PEVIATION. SO, FOR LAREE n, 


•95 


Pr(-1.96 $ Z $ 1.96} 


PrO-1.96 $ 




^ 1.96} 


TURHltlt "THE 
1 SAME AL&EPPA 
CRAUK A* 
BfcfbVJt... 

T~ 



A6AIN, NOT KNOWING- cr, WE REPLACE 
WITH S, THE SAMPLE STANPARP PEVIATION: 

.95 = PrC-1.96 ^ -r-^- ^ 1.96} 

svn 



THE TERM ^ IS CALLEP THE SAMPLE fTAHPARP ERROR, ANP WRITTEN 
$£(X) WE CONCLUPE THAT 

.95 = Pr(X- 1.965E0*} s s X+1.965E6*}} 

WHERE 

^^} * vlf 
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JU5T A5 BEFORE, WE HAVE 
FOUNP THAT THE RAN POM 
INTERVAL 

X ± 1.96 5 € ex') 

COVERS THE TRUE MEAN, with 
PROBABILITY .95... 50 NOW WE CAN 
CALL IN 5HERLOCK HOLME5 TO 
MAKE A 5TATI5TICAL INFERENCE 
BA5EP ON A 5IN6LE_5AMPLE OF 
5IZE /7 WITH MEAN %. 




A5 BEFORE. FOR AN ARBITRARY 
LEVEL OF CONFlPENCE t -a, 
WE REPLACE 1 9C BY 2 :^. 


AREA 

1-oL 





THE SAMPLE MEAN X WAS 14S.2 
LBS ANP SAMPLE STANPARP 
PEVlATlON S WAS 23.7. SO THE 
STANPARP ERROR IS 

se6o> = 


ANP WE NOW HAVE 95% ZONF/PENZE 
THAT THE MEAN WEIGHT OF ALL 
PENN STATE STUPENTS FALLS IN THE 
INTERVAL 

X ± 1.9 6^00 
= 149.2 ± 0 96X2 47) 

- 149 2 ± 4.0 POUNP4 


TO SUMMARIZE^ FOR A SIMPLE RANPOM SAMPLE CSRS; OF LAR6E SIZE. THE 
(1~a)-WO*> ZONFlPENZE INTERVAL IS; 

POPULATION MEAN, p POPULATION PROPORTION, p 


M - X ± z a SE(^) P ^ P ± *«K(P'> 

T * _ 

WHERE SE(^) - ^ WHERE SEC^)- 


23J7_ 

V3T 


= 2.47 


LET S REVISIT THE STDPENT WEISHT PATA 
FROM CHAPTER 2. ASSUMING THAT THE 
n » 92 STUPENTS WERE A SIMPLE 
RANPOM SAMPLE OF ALL PENN STATE 
STUPENTS. __ 



THE SIZE OF BOTH 
INTERVALS IS 
ZONTROLLEP BY 
THE LEVEL OF 
ZONFlPENZE 
(1-a)-loot ANP 
THE SAMPLE SIZE, 77. 
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Student's t (again!) 

AS WE SAW IN CHAPTER A, THE STATISTS 

^E('A') 

HAS AN APPROXIMATELY NORMAL 
PISTRIBUTION ONLY WHEN IT IS 
COMPUTEP USIN6- A LAR6G 9AMPL£. 

FOR SMALL SAMPLES in-9, 1 0, 29...), 

THIS IS NO LONGER THE ZASE. ANP 
WE HAVE TO USE THE STUPENT’S t 



LET S LOOK AT t A LITTLE MORE ZLOSELV WE MENTIONEP THAT THE t 
PISTRIBUTION IS MORE SPREAP OUT THAN THE NORMAL, ANP THAT THE 
AMOUNT OF SPREAP PEPENPS ON THE SAMPLE SIZE. 



WHAT ITS PlSZOVERER 
SOSSET PIP WAS TO 
QUANTIFY THIS 
RELATIONSHIP. IF /7 IS 
THE SAMPLE SIZE. HE 
SAIP, THEN ZALL /7-1 
THE NUMBER OF 

degrees off 
freedom 

OF THE SAMPLE. 


a 


THE 6-ENERAL IPEA: 6IVEN n 
PIEZES OF PATA X t . - 
YOU USE UP ONE ‘PE&REE 
OF FRE6POM” WHEN yOU 
ZOMPUTE X , LEAVING, n -1 
INPEPENPENT P/EZES OF 





0O55ET toMPirrgp table* of 
THE t PI5TRIBUTION FOR 
PIFFERENT SAMPLE 5IZE5-I.E., 
PEOREE5 OF FREEDOM WE 
REPEAT, THE MORC PC6RCC* OF 
FRCCPOM, the closer t 
BECOME* TO THE 9TANPARP 
NORMAL. 




f h NICE, \ 
scoPfy \ 
pi^Wotioj' / 


KNOWING THE SAMPLE 5IZE 71, WE ZH005E THE t PI5TRIBUTION WITH 71-1 
PE6REE5 OF FREEPOM. 


A5 WITH THE z 
PI5TRIBUTION (IE-, 
THE 5TANPARP 
normal;, WE OET A 
95% £ONFlP£N£E 
LEVEL By FIN PI NO 
THE £RlTl£AL VALUE 
5 BEYONP 
WHl£H THE AREA 
UNPER THE £URV£ 

I* .029. 


FUFF - . 9 ? 



me zorve \* ^ 

PLAnEP T*Au NORMhL, 

15 FARTHER FROM 
vO THAH Z'-ov,. * 



FOR A (l-aJ-IPPK COUFIPEH6E INTERVAL, WE FlNP THE ^RITI^AL VALUE t« 

* 

5U£H THAT t«) - HERE 15 A 5HORT TABLE OF £RITI£AL VALUE5 

2 ^ 

FOR THE fc PI5TRIBUTION: 


PEOREE5 OF 
FREEPOM 


1-rtr 

a 

a/2 

.90 

.20 

.10 

■90 

.10 

.09 

99 

09 

.029 

■99 

.01 

.009 

i 

309 

6.31 

12.71 

6366 

ip 

1.37 

1.01 

2.23 

414 

3P 

1.31 

1.70 

2.04 

2.79 

1PP 

129 

166 

199 

2.63 

<50 

1.29 

169 

196 

2.99 




EA£H COLUMN REPRESENTS A FlXEP LEVEL OF CON Fl PENCE, WITH INCREASING 
NUMBERS OF PEGREES OF FREEPOM. THE HIGHER THE PEGREES OF FREEPOM, 
THE CLOSER THE CRITICAL VALUE GETS TO z ay , THE CRITICAL VALUE OF THE 
NORMAL PI5TRIBUTION 


WE PERIVE THE WIPTH OF OUR 
CONFlPENCE INTERVAL PIRECTLy 
FROM THE PEFfNlTlON OF t 


SE(A) 

THEN, FOR CONFlPENCE LEVEL 


f moT& vr's \ 
EXACTLY liKt. \ 
THE CPA^ Of 
A LAR^e SAMPLE, 
£irr With *t ^ 
INSTEAD OF Z •' / 


-a) — PrCxSG(X~) $ >u ^ 



FROM WHICH WE INFER; GIVEN A 
SlNGLE_SAMPLE OF SIZE Z7 ANP 
MEAN X, WE CAN BE 0-(X)-1OO% 
CONFlPENT THAT THE POPULATION 
MEAN M FALLS IN THE RANGE 

ju =■ X±t a ‘>Q(X') 


WHERE *£(X) = ANP ta 15 THE 
CRITICAL VALUE OF THE t PlSTRIBUTlON 
WITH Z7-1 PEGREES OF FREEPOM. 


y You \ 
’ Will* 
MEMORIZE • 
k THlG • y 



|jAfP ( STRlCTLy SPEAKING. 

IE* THE PERIVATION OF 
THE t PISTRlBUTION PEPENPEP ON 
THE ASSUMPTION THAT THE SAMPLE 
WAS FROM A NORMAL POPULATION. IN 
PRACTICE. CONFlPENCE INTERVALS 
BASEP ON THE t WORK REASONABLY 
WELL. EVEN WHEN THE POPULATION 
PlSTRIBUTlON IS ONLy APPROXlMATELy 
MOU NP-SHAPEP 





example: SUPPOSE CHAMCLCoN MOTOR9 MAS TO CRASM TEST 

ITS CARS TO PETERMINE TME AVERAGE REPAIR COST OF A IP M.P.M. MEAP-ON 
COLLISION. THIS IS EXPENSIVE! TMEY PECIPE TO TRY IT ON JUST FIVE 



TMEY FlNP TME PAMA6E PATA TO BE *150, *4PP. *720, i900, ANP $930. 

TME SAMPLE MEAN: --- 

( MM IMPROVES 

X ~ >540 ^TME STyiiN6 

TME STANPARP PEVlATlON 

$ * $299 

YOU CAN CMECK S WITM A 

manp calculator, its 
+J ^\ c ? 0 ' t ? 9 oT -ri^oo^Hof +(720-9^0)+C ‘700-546 ?'+- (9so -Ffto f') 



SO WMERE CAN WE PLACE TME MEAN WITM 99% CO NFlPENCE? WE FlNP OUR 
CRITICAL VALUE t w WITM 4 PE6REES OF FREEPOM 


1-a- 

a 

a/2 

90 

■20 

.10 

.90 

.10 

09 

■99 

09 

.029 

.99 

.01 

009 

PE6REES OF 

1 

3.09 

6.31 

12.71 

6366 

FREEPOM 

2 

199 

292 

4-30 

992 


3 

IM 

239 

319 

904 


4 

1.93 

2.13 

270 

460 


9 

1.49 

2.01 

297 

4.03 


IM 





ANP PL U& IT IN: 

/* = jt± 2 76%^ 

- ^Oi 270 (**>/&) 

= 540 * 372 


50 THE BE5T WE CAN 5Ay WITH 99% CONFIPENCE 15 THAT THE AVERAGE 
PAMA6E WILL LIE BETWEEN $160 AMP $912. 




TO COMPUTE THI5 CONFIPENCE INTERVAL U5IN& 5TUPENT5 t WE HAVE MAPE 
AN UNSTATGP AtfUMFTlON. WE A55UMEP THAT CRA5H REPAIR C05T5 ARE 
APPROXlMATELy NORMALLY CHtTRtBUTSP, |.E, IF WE CRA5HEP WOO 
CHAMELEON5, THE HI5TOC-RAM OF REPAIR C05T5 WOULP BE 5yMMETRlCAL ANP 
MOUNP-5HAPEP WE CAN NOT KNOW THIf FROM 9 PATA POINT5 ALONE- BUT 
MAyBE y£AR5 OF EXPERIENCE WITH EARLIER MOPEL5 PROVIPE NORMALLy 
PI5TRIBUTEP C05T Hl5TOC4?AM9 FOR FRONT ENP REPAIR5: INFORMATION WHICH 
WOULP TENP TO 5UPPORT OUR U5E OF 5TUPENT 5 t 
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TO SUM UP (!), wg 
NOW HAVE THREE 

simple recipes for 

FINPIN6 CONFlPENCE 

intervals. for 

PROPORTIONS, OR 
MEANS WITH LAR6-E 
SAMPLE SIZES, WE 
LOOK UP IN A 

NORMAL TABLE FOR 
MEANS OF SMALL 
SAMPLE SIZES CSAy 
n^W), WE FlNP ta 
IN THE t TABLE. 2 


IN ALL CASES, THE WIPTH OF THE INTERVAL IS THAT CRITICAL VALUE TIMES 
THE STANPARP ERROR: 


Za^E(p) zJ?£(X) ta$£(X) 

Z 2 2 


ANP EACH OF THOSE STANPARP ERRORS IS PROPORTIONAL TO THAT MA6IC 
NUMBER: 







♦ Chapter 84 

HYPOTHESIS TESTING 

NOW WE ENTER A NEW AREA .. GOVERNMENT, 

BUSINESS. ANP THE HARP ANP SOFT SCIENCES ALL 
USE ANP OFTEN ABUSE THESE TESTS OF 
SIGNIFICANCE- IT S ALL ABOUT ANSWERING THE 
QUESTION. u COULP THESE OBSERVATIONS 
REALLY NAVE OCCURREP BY CHANCE?” 
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WE BE6-IN WITH AN EXAMPLE 
FROM THE LAW: A COMPOSITE 
OF SEVERAL CASES AR6UEP IN 
THE SOUTH BETWEEN 19*0 
ANP mo, IN WHICH EXPERT 
WITNESSES PRESENTEP THE 
CASE FOR RACIAL BIAf /W 
UWy 4GLCCTIOH- 



/ 



PANELS OF JURORS ARE THEORETICALLY PRAWN AT RANPOM FROM A LIST OF 
ELIGIBLE CITIZENS HOWEVER, IN SOUTHERN STATES IN THE 5PS ANP CPS. FEW 
AFRICAN AMERICANS WERE FOUNP ON JURY PANELS, SO SOME PEFENPANTS 
CHALLEN6EP THE VERPlCTS. ON APPEAL, AN EXPERT STATISTICAL WITNESS 6-AVE 
THIS EVIPENCE: 


1 ) 


50% OF ELIGIBLE CITIZENS 
WERE AFRICAN AMERICAN. 





ON AN 90- PERSON PANEL 
OF POTENTIAL JURORS, 

only four were 

AFRICAN AMERICANS. 






COULP THIS BE THE RESULT OF 
FURG CHAHCC? 




FOR THE SAKE OF ARGUMENT, 
SUPPOSE THAT TME SELECTION OF 
POTENTIAL JURORS WAS RANPOM. 
THEN THE NUMBER OF AFRICAN 
AMERICANS ON TME 00-PERSON 
PANEL WOULP BE TME BINOMIAL 
RAN POM VARIABLE X WITH 
=00 TRIALS ANP p = .*. 



THUS. TME CHANGES OF 6.ETTIN6- A JURY 
WITH ONLY A AFRICAN AMERICAN* I* 
PrCX<4J, WHICH WORK* OUT TO ABOUT 

.oooooooooooooooooyA CO. 



*INCE TME PROBABILITY I* *0 *MALL, 

TME PARTICULAR PANEL WITH ONLY FOUR 
BLACK MEMBER* I* 4TRON6 EViPENCE 
A6AIN*T TME HYPOTHE */* OF RANPOM 



TO PRIVE TME POINT HOME. TME 
*TATf*TICtAN NOTE* THAT TMl* 
PROBABILITY I* LE** THAN TME CHANCE* 
OF 6ETTIN6. THREE £ON*EEUTI VE 
ROYAL FLU*HE* IN POKER 



*0 TME JUP6-E REJECT* THE 
HYPOTHEC* Of RANPOM *ELECTION 
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LET'S FOLLOW THE PROCESS A£AIN TO 
SORT OLTT THE FOUR FORMAL fTEPf Of 
STATISTICAL HYPOTHESIS TESTING. 


Step I • formulate all 

HyPOTHEfE*. 

Ho, the A/m hypothec, is IM ^ourt case, Ho says the 

USUALLY THAT THE JURY WAS RANPoMLY CHOfEN 


OBSERVATIONS ARE THE RESULT 
PURELY OF CHANCE 

H a # ™ E ALTERNATE HYPOTHEC. 

IS THAT THERE IS A REAL 
EFFECT, THAT THE 
OBSERVATIONS ARE THE 
RESULT OF THIS REAL EFFECT, 
PLUS CHANCE VARIATION. 



Step 2. THE TE*T STATISTIC. 
IDENTIFY A STATISTIC THAT WILL ASSESS 
THE EVIDENCE A6A/NST THE NULL 
HYPOTHESIS- 



FROM THE WHOLE POPULATION. 
AFRICAN AMERICANS HAVE 
PROBABILITY p - .50 OF BE 1 N 6 
CHOSEN.. 

H« SAYS THAT AFRICAN AMERICANS 
ARE LESS LIKELY THAN THEIR 
PROPORTION IN THE POPULATION 
TO BE SELECTED FOR A JURY 
PANEL p < .50 . 



IN THE COURT CASE, THE TEST 
STATISTIC IS THE BINOMIAL RANPOM 
VARIABLE X WITH p~-50 ANP 
/?- 90, 





IN THE JURy CASE, THE STATISTICIAN 
TOOK a TO BE 3.6 x 10 10 , THE 
CHANCES OF BEIN6 PEALT THREE 


IN THE EXAMPLE. THE P-VALUE WAS 
P/YX*M | ANP n^BC?) 

= 1.4 x 10' ,e 

WE COMPUTEP THIS P-VALUE THE 
MOPERN WAy, USIN6. A STATISTICAL 
SOFTWARE PACKAGE. 


(TUE '5oA 

We- u$ep * 

HORSfc- 
PRAWN , 
C0R?0T£RS7 











A Royal fLD*u 

r A^AviM?? y- 


IN LE6-AL PROCEEPIN&5. TME 
5TANPARP 15 MORE FLEXIBLE- 


IN 5CIENTIF 1C WORK, A FlXEP a -LEVEL OF .05 OR 01 15 OFTEN U5EP THE5E 
FlXEP LEVEL5 ARE A HOLPOVER ARTIFACT FROM THE PRE-COMPUTER ERA, 
WHEN WE HAP TO REFER TO TABLE5, WHICH WERE PRINTEP ONLy FOR 
5ELECTEP CRITICAL VALUE5. 5TILL, MANy 5CIENTIFIC JOURNAL5 CONTINUE TO 
PUBLI5H RE5ULT5 ONLy WHEN THE P-VALUE $ 05. 




LARGE SAMPLE 

SIGNIFICANCE TEST FOR 
PROPORTIONS 

THE JURy EXAMPLE WAS A SPECIAL GASE 
OF A GENERAL PROBLEM THE NULL 
HyPOTHESlS HAP THE FORM p - po, 

WHERE p 0 WAS SOME PROBABILITY ON 
THIS GASE, 5 ). NOW LETS LOOK AT 
SUGH PROBLEMS 6-ENERALLy: LET 5 
TEST THE HYPOTHEC p ~ po . 



iilP® 


r 

AS USUAL, WE IMAGINE WE HAVE A BIG- POPULATION... WE OBSERVE A LARGE 
SAMPLE .. ANP WE FINP THAT SOME GHARAGTERISTlG OGGURS WITH 
PROBABILITY p 


rowf?I 


gw 


Kr^jJ 


i^-rirreJ 











KPL, 







BASEP ON THIS ^ 

OBSERVATION. WE WANT ‘ nflT 

TO KNOW IF THE TRUE . li ^ - 

POPULATION PROBABILITY IS 

(FOR instance; LARGER THAN SOME OTHER VALUE p 0 FOR EXAMPLE, 
SENATOR ASTUTE. HAVING- FOUNP A p OF 55, WOULP LIKE TO KNOW THAT 
p > A WINNING- MAJORITY 
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Step 1 • 

THE NULL HyPOTHESIS IS 

Ho : P ~ Po 

THE ALTERNATE HyPOTHESIS PEPENPS 
OKI THE PIRECTION OF THE EFFECT 
WE ARE LOOKING FOR. IN SENATOR 
ASTUTE'S CASE, 

W« * P > Po 

BUT IN OTHER CA5E5, THE ALTERNATE 
HyPOTHESIS MI&HT WELL BE 

W« • ^ < po 
OR 

M a *• p*po 

FOR EXAMPLE, IN THE JDRy SELEC¬ 
TION EXAMPLE, THE ALTERNATIVE 
HyPOTHESIS WAS 

M* * /> < 05 

ANP AT OTHER TIMES. WE ARE 
INTERESTEP IN KNOWING THAT p IS 
PIFFERENT FROM SOME VALUE p. 

FOR INSTANCE. IN TESTING FOR A 
FAIR COIN, WE HAVE AN ALTERNATE 
HyPOTHESIS OF 

H fl : p * 0.5 

BUT HAVE NO A PRIORI OPINION 
ABOUT WHETHER HEAPS OR TAILS 
WILL COME UP MORE OFTEN. 


Step 2 • THE TEST STATISTIC IS 
P'P> 

z °**' '{fbO'p>V47r 

WHICH MEASURES HOW FAR p PEVIATES 
FROM po. UNPER THE NULL 
HyPOTHESIS, Zon HAS THE STANPARP 
NORMAL PISTRIBUTION. 

Step 3 •THE P-VALUE PEPENPS 
ON WHICH ALTERNATE HyPOTHESIS IS 
RELEVANT. 

fjJ-Rl&HT-HANPEP" H a : p> p 0 
USES P-VALUE Pr(Z > 



b) ■LEFT-HANPEP’ W a * p < p 0 
USES P-VALUE Pr (Z < 



C) TWO-SIPEP" M a •• p*po 

USES P-VALUE PrflZl > \z^\) 







THE SENATOR THUS REJECTS 
THE NULL HyPOTHESIS, ANP 
HE ("ANP HIS BANKERS? NOW 
FEEL CERTAIN HE’S IN THE 
LEAP. 


IN THE CASE OF SENATOR ASTUTE - - 


■ | THE HyPOTHESES ARE 
U a *• p 
K P > * 

HIS TEST STATISTIC IS 

7 ^ .55-50 

^obs " ■ ■ - - * 3.)6 

'y(5X5)/fiooo 

3) HIS P-VALUE IS 

Pr(Z> z^) - Pr(z 2 3.16) -.0006 

(FROM THE NORMAL TABLE;. 

4) ASTUTE. BEING FAlRLy CONSERVATIVE, 
TAKES A SIGNIFICANCE LEVEL a OF 0\ 

ANP OBSERVES THAT 

Pr(Z> Zr»J - .0009 < a 
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LARGE SAMPLE 



TEST FOR THE 
POPULATION MEAN 


HERE 15 HOW A 5I6NIFICANCE TE5T 
MI6-HT BE U5EP IN IWPCCTION 
*AMPUN6, AN IMPORTANT INPU5TRIAL 
APPLICATION. 


A/EW A6£ 6RAHOLA INC. CLA1M5 THAT 
THE AVERAGE WEI6-HT OF IT5 CEREAL 
BOXE5 15 AT LEA5T H OZ. THE 6GNUINC 
6ROC£RV CORPORATION WILL 5ENP BACK 
A 5HIPMENT IF THE AVERAGE WE16HT 15 
ANy LE55 





First, they choose tmeir 
HYPOTHESES. 

W 0 - ,u ^ 16 OZ. 

H a : /< < 16 OZ. 

REJECTING twe KOU 
HyFOTH^i^ 

PfcWiW* THE 6RAKiOlA 



NEXT, THEY CHOOSE A TEST STATISTS. BY WOW, IT SHOULP BE PRETTY MU£H A 

kwee-jerk rea^tiow to kwow that the sample spreap from the meaw is 


^ Mo __ X M-p 

%, 

WHERE S IS THE SAMPLE 
STAWPARP PEVlATIOW. UWPER 
THE WULL HYPOTHESIS, THIS 
APPROXIMATES THE 
STAWPARP WORMAL WHEW 
THE SAMPLE IS LAR6-E, BY 
THE £EWTRAL LlMfT 
THEOREM. 



SKIPPIW6- OVER STEP 3 FOR A MOMEWT. THEY SET A Sf6-WlFl£AW£E LEVEL. BEIW6 A 
BUWCH OF PROPPEP-OUT SOEW£E MAJORS, THE 6-RO^ERS THIWK a* O’} SOUWPS 
ABOUT RI6-HT. 


i i?E«cweep me 

5 YtAH- 



JUST THEW, A BOXCAR 
LOAPEP WITH \OfiOO 
BOXES OF 6RAWOLA 
ARRIVES AT THE POOR 





THEY PULL OUT A 
SIMPLE RANPOM 
SAMPLE OF 49 BOXES, 
WEIGH EACH ONE, ANP 
PETERMiNE THE 
SAMPLE S SUMMARy 
STATISTICS: 

X - 1 5.90 oz. 

S ^ .35 oz 

A LITTLE LIGHT-BUT 
SIGNIFICANTLY SO? 



THEY PLU6> THE VALUES INTO THE TEST STATISTIC TO FINP 


15-9-16 

3*41*9 


= -2 


NOW THEY COMPUTE THE P-VALUE 

Pr(z < -2 | Hf?; ® . 0221 


-2 -Z.o* o 

THIS BEING LESS THAN THE 05 

significance level, genuine grocery 

REJCCT5 THE NULL HYPOTHESIS, ANP 
THE SHIPMENT. 


Send »T > 
&ACVC, VpU 
BURfj 

ARTIST {( J 







Got thp mumchibs, 

MM • I O'PN'T 
THiNtc ANYOKt WOULD 
NPTICE IF I ATE A 
Little FROM every 






SMALL SAMPLE 

TEST FOR THE POPULATION 

MEAN 



WE RETURN TO CHAMELEON MOTORS, ANP fTS W M RU- CRASH TEST. THE 
RICHTEOUS INSURANCE COMPANY WILL INSURE AN AUTO ONLy IF THE MEAN 
REPAIR COST AFTER A W M PH- COLLISION IS LESS THAN tlOOO. THE COMPANY 
USES A STANPARP a • 05 AS tTS SIGNIFICANCE LEVEL SO- 

Ho - M > HOOO MEAN COST IS TOO HIGH 

W a -. jj.< tWOO MEAN COST IS OK 

THE TEST STATIST \C IS THE t PlSTRIBUTlON 

J-yUp WHERE Mo IS THE 

t - —==- HyPOTHETl£AL MEAN 

SEM) oF 




ANP WE WANT OUR OBSERVEP 
t VALUE TO LIE TO THE LEFT 
OF -tof (BECAUSE LOW X IS 

PESIRABLE, X-Mo *HOULP BE 
NEGATIVE. TO SUPPORT H a ). 


US 




a 




.05 

.025 

.005 

u. 

r\ —. 

i 

631 

12.71 

63.66 

vo ^ 

^ Q 

2 

292 

4.30 

992 

LU B 

LU U) 
cn ui 

3 

2.35 

310 

504 

ft: 

4 

2.1? 

270 

4 60 

5 

2.01 

257 

4.03 


FROM THE TABLE OF CRITICAL 
t VALUES, WE SEE THAT 
t .09 = 2.1?, SO WE PEClPE TO 
REJECT H 0 IF 

^006 ^ ~^Of * " 2 . 1 ? 

FROM CHAPTER 6, WE HAVE 
^ = $540 ANP S * $299 



THE CAR PASSES THE TEST- Ho IS REJECTEP ANP THE INSURANCE POUCy IS 
ISSUEP. 



THIS IS AW EXAMPLE OF ACC£FTAK£C 6AMPUN6- THE WULL HyPOTHESlS IS 
THAT REPAIR COSTS ARE UNACCEPTABLE, ANP THE MOTOR COMPANy IS 
ASSUMEP C-UlLTy UNTIL IT PRESENTS SUFFICIENT EVIPENCE OF ITS 
INNOCENCE-I E.. THAT ITS PROPUCT IS WITHIN SPECIFICATIONS 
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DECISION THEORY 

WE £AW THIWK OF HYPOTHESIS TESTIW6 AWP 
Sl6-WlFiaW££ TEST* fW TERMS OF A HOUfEHOLP 
fMOKE-PETECTOR- IF YOU WAVE ONE OF THESE 
WHERE YOU LIVE, YOU’VE PROBABLY WOTl^EP HOW IT 
TEW PS TO GO OFF EVERY TIME YOU MAKE THE TOAST 
TOO PARK/ 


THIS IS WHAT IS £ALLEP A TYPE / ERROR- AW ALARM WrTHOUT A FIRE- 
CONVERSELY. A TYPE II ERROR IS A FIRE WITHOUT AW ALARM. EVERY COOK 
KWOWS HOW TO AVOIP A TYPE I ERROR; JUST REMOVE THE BATTERIES. 
UWFORTUWATELY, THIS INCREASES THE INCIPENCE OF TYPE II ERRORS' 






Kim 




:W*j 



wg cm summarize this in a two-by-two peci*ion table. 


NO ALARM 
ALARM 


NO FlRg FIRg 


NO gRROR 

Typg n 

Typg i 

NO gRROR 


NOW THINK OF THg NULL HYPOTHgSlS AS THE CONPITION OF NO FIRE, W\MIG 
THg ALTgRNATg HYPOTHgSlS 15 THAT A FIRg IS BURNING THg ALARM 
ZORRgSPONPS TO REJECTION OF THg NULL HYPOTHESIS; 


A^gPT Ho 

reject w 0 


TRUg STATE 


H o Ha 


NO gRROR 

Typg n 

Typg i 

NO gRROR 


ALL THg Sl&NlFlZANZE TESTS Wg PIP gARLigR IN THIS CHAPTER EMPHASIZEP 
THg PROBABILITY OF COMMITTING A TYpg I gRROR-l.g., THg PROBABILITY OF 
OUR OBSERVATIONS OCCURRING? IF H 0 WAS TRUE- Wg PgMANpgp THAT 

Pytrejecting H c \ W c ) = PrCTYPG I ERROR I W D ) =• a 


1-a MEASURES OUR CONFIPENCE THAT ANY ALARM BELLS WE HgAR ARE 
GENUINE. HIGH CONFIPENCE MEANS RARgLY SETTING OFF FALSE ALARMS. 



15Z 







BUT SOMETIMES WHAT WE REALLY WANT TO KNOW 15 THE CHANCE OF MAKING- 
A TTFE - II ERROR.' IN OTHER WORPS, HOW SENSITIVE IS OUR 'ALARM SySTEM' 
WHEN THE ALTERNATE HypOTHESlS IS TRUE? 


AN ENVIRONMENTAL 
EXAMPLE: ^ 


'Mil 


ftf «- 


IN THE PAST. FACTORIES PlSCHARG-lNG- CHEMICALS INTO WATERWAyS WERE 
REOUlREP TO SHOW THAT THE PISCHAROE HAP NO EFFECT ON THE POWN- 
STREAM WILPLIFE. THAT'S H 0 . THE POLLUTER COULP CONTINUE AS LONG- AS 
THE NULL HypOTHESlS WAS NOT REJECTEP AT THE 05 SlG-NIFlCANCE LEVEL 


^ Vi 




BodBt-S 


SO A POLLUTER. SUSPECTING- THAT HE WAS IN VIOLATION OF EPA STANPARPS, 
WOULP PEVISE AN INEFFECTIVE POLLUTION MONITORING- PROG-RAM 


WetL \KiT£RVl£VJ 

A. F£v^ pa^l<4!.' 
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THE POLLUTER IS PELIGHTEP SINCE, LIKE OUR SMOKE ALARM WITHOUT A 
BATTERy, HIS TEST HAS LITTLE OR NO CHANCE OF SETTING OFF AN ALARM 






LET'S FORMALIZE THIS 
I PEA. TO PESCRIBE THE 
PROBABILITY OF A TYPE - 
// ERROR, WE BREAK OUT 
ANOTHER GREEK LETTER: 

beta, or p 

P = PrfACCEPTlNG |H a ; 

* PrVT/PE II ERROR | H a ; 

THE POWER OF A TEST 
IS PEFlNEP AS 1'P IT S 

Pr ^ REJECTING | W a ). 


ER MV 
0OOT 

piswet? 


YOU'LL BE HAPPY TO KNOW THE 
ENVIRONMENTAL REGULATORS HAVE 
MOVER IN THE PlRECTION OF REQUIRING 
POLLUTION MONITORING PROGRAMS TO 
SHOW THAT THEY HAVE A HIGH 
PROBABILITY OF PETECTING SERIOUS 
POLLUTION EVENTS. THE REQUlREP 
POWER AHUM* OFTEN REVEALS 
HIPPEN FLAWS IN THE MONITORING 
PROGRAM. 



M 




ONE WAY TO VISUALIZE THE EFFECT OF A TESTS POWER IS By GRAPHING THE 
PROBABILfTy OF REJECTING- W 0 AGAINST THE ACTUAL STATE OF THE SYSTEM IN 
THE CASE OF A SMOKE ALARM, THE PROBABILITY CLIMBS TOWARP 1 AS THE 
SMOKE GETS THICKER. 



for the e.pa. water quality example, the horizontal axis is the true 

CONCENTRATION OF POLLUTANT IN THE WATER. 



HERE ARE THE POWER CURVES FOR THREE MONITORING PROGRAMS. THE 5AVG 
EVERY LAfT 6VPFY (COSTS *5 MILLION;, THE 60LPEN MEAN (COSTS 
*500,000), ANP POH'T ROC* THE BOAT (ALSO COSTS *500,000, BUT THEY PUT 
ON A GOOP SHOW/;. THE HIGHER THE TEST'S POWER, THE STEEPER THE CURVE. 


155 







/ CON6RATULATION*! WITH THESE ' 
SECTIONS COVERING THE BASIC* of 
CONF lPENCE INTERVALS ANP 
HYPOTHESIS TESTING, YOU HAVE JUST 
COMPLETEP yOUR FIRST COURSE IN 
V CLASCAL STATISTICS' , 


WHy THEN PO you HAVE SUCH AN £XP7y FCEUN6 IN YOUR STOMACH? 
BECAUSE, TO USE THESE I PEAS IN ANy PRACTICAL WAY, WE HAVE TO BE ABLE 
TO APPLy THEM TO A VARIETY OF SITUATION* WE HAVEN'T EVEN TOUCHEP ON 
YET. THAT I* WHERE WE ARE 6 O 1 N& NEXT, WITH THE £OMPARI*ON OF TWO 
POPULATION*. 


O VT./ 

OH THE 

?<PFULMiOfA^f/ 


i% 




♦ Chapter 94 

COMPARING 
TWO POPULATIONS 


IN WHIGH WE LEARN SOME NEW RECIPE* USING- 
OLP ING-REPIENTS - 



IS7 


THE LAST TWO CHAPTERS EXPLAINEP 
CONFlPENCE INTERVALS ANP 
HYPOTHESIS TESTING WITH THE 
6TGAK ANP POTATOG* OF RANPOM 
MOPELS: THE NORMAL ANP THE 
BINOMIAL PISTRIBUTIONS f 


vilTU THE NopMAl> 
?C^tt\b ThE ROLE 

. of rue Potato 1 . 


c , Tfl 

o >V 


./'iS* 



BUT WHAT MAXES STATISTICS ALMOST AS CHALLENGING AS COOKING IS THE 
VARIETY LIKE AN EXPERT COOK, THE STATISTICIAN CAN ‘TASTE" THE 
INGREPIENTS IN A PROBLEM ANP THEN FINP THE MOST EFFECTIVE WAY TO 
COMBINE THEM INTO A STATISTICAL RECIPE. 



(THE REASON COOKBOOKS ANP STATISTICAL METHOPS TEXTS ARE SO HEAVY IS 
THAT THEY BOTH PROVIPE SOLUTIONS IN A GREAT VARIETY OF SITUATIONS!} 
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IN THIS CHAPTER. WE LL USE OUR A^EAT 
ANP POTATOES METHOD IN SOME NEW 
RECIPES THAT Witt HELP US ANSWER 
THE FOLLOWING QUESTIONS 


DOE5 A PARTICULAR PESTICIDE 
INCREASE THE yiELP OF CORN PER 
ACRE? 


PO MEN ANP WOMEN IN THE SAME 
OCCUPATION HAVE DIFFERENT SALARIES? 


ANP, AT THE END OF THE CHAPTER. 
WE LL LOOK AT A DIFFERENT WAy TO 
COMPARE TWO MEANS THAT DOESNT 
INVOLVE TAKING TWO SIMPLE 
RANDOM SAMPLES.. A. 


THE COMMON IN6REPIENT IN THESE 
QUESTIONS IS THIS THEy CAN BE 
ANSWERED By COMPARIH6 TWO 
IHPCPCHPCHT RAH POM *AMPLC*. 
ONE FROM EACH OF TWO 
POPULATIONS 


NO 







TM<£ 


Comparing SUCCESS RATES 

(or failure rates) for two populations. 

WE BEGIN WITH AW EXPERIMENT, PART OF A HARVARP STUPY, THAT SOUGHT TO 
PEClPE THE EFFECTIVENESS OF ASPIRIN IN REPDCING HEART ATTACKS AS IN 
MOST CLINICAL TRIALS, THE CHANCES THAT ANy ONE INPIVIPUAL GETS THE 
PISEASE—IN THIS CASE, A HEART ATTACK—IS VERy SMALL IN ANy GIVEN yEAR 
BUT WE WANT ANSWERS QUlCKlV! WHAT PO WE PO? 


THE SIMPLE, BUT EXPENSIVE, SOLUTION IS TO TEST A LARGE NUMBER OF 
INPIVIPUALS IN A SHORT TIME. IN THIS STUPy. 22.P71 SUBJECTS fALL 
VOLUNTEER POCTORS) WERE RANPOMLy ASSlGNEP TO TWO 6ROUPf. 


CROUP ONE TOOK A PLA£CBO -A 
PILL IPENTlCAL TO ASPIRIN, BUT 
CONTAINING NO ASPIRIN. 


CROUP TWO RECEIVER ONE 
ASPIRIN A PAV 


1*0 







OVER A PERIOP AVERAGING. 
WEARLy FIVE y£AR5*, THE 
IWVE5TI6-ATOR5 REOORPEP 
THE RESPONSE* HEART 
ATTACK OR WO HEART ATTACK 
THE RESULT: £IW THE 
WDMBER5 THAT FOLLOW, WE 
HAVE £OMBIWEP FATAL AMP 
WONFATAL HEART ATTA£K5.; 




ATTACK 

WO ATTACK 

n 

ATTACK RATE 

PLACEBO 

239 

10.799 

11,034 

£i 239 

p< ■ * 02,7 

A5PIRIW 

139 

10,990 

11.037 

p, • - .£>144 

72 11.P37 



THE OB5ERVEP PIFFEREWCE 
IW 5U^E55 RATE 15 

p y -p z *.oo9 1 . rr 50UWP5 

5/MALL UWT/L yOU LOOK AT 
THE RELATIVE RI5K, 


P217 

.0126 


= 1 . 72 . 


MEMBER5 OF THE PLACEBO 
6ROUP WERE 1.72 TI/UE5 
LIKELIER TO 5UFFER A HEART 
ATTACK THAW TH05E IW THE 
A5PIRIW 6ROUP 


*THE 5TUPy WA5 5TOPPEP EARLy BECAU5E OF IT5 POSITIVE OUTCOME IT WOl/LP 
HAVE BEEW PWWI5E AWP IMPRACTICAL TO PEWy THE RE5ULT5 TO THE &ROUP 
TAKING THE PLACEBO 
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The Model: THE PLACEBO ANP ASPIRIN 6-ROUP OBSERVATIONS 
ARE INPEPENPENT SAMPLES FROM TWO BINOMIAL POPULATIONS- FOR 
6 ONSISTEN 6 X WE REFER TO A HEART ATTACK AS A (!). 



PLACEBO ASPIRIN 

POPULATION ONE POPULATION TWO 

6HAN6E OF SUCCESS = p } 6HAN6E OF SU^ESS = p 2 


THE OBJECTIVE IS TO ESTIMATE THE TRUE PIFFEREN6E, p-p z 


FOR EA£H POPULATION ('ACTUALLY LAR6-E SAMPLES OF THE GENERAL POPU¬ 
LATIONS, WE HAVE THE FAMILIAR RANPOM VARIABLES; 

v NUMBER OF SU^ESSES -j NUMBER OF SUO'ESSES 

1 IN POPULATION ONE 2 IN POPULATION TWO 


^ X, PROPORTION OF ^ X 0 PROPORTION OF 

P s _L SU^ESSES IN p - * SUCCESSES IN 

1 77 1 POPULATION ONE 2 77 2 POPULATION TWO 

ANP AN ESTIMATOR OF PlFFEREN6E IN RATE: — P £ 



ANP NOW, LIKE A BROKEN 
RE60RP, WE ASK OURSELVES, 
HOW IS P, - P 2 PISTRIBUTEP? 


1fc2 




Sampling distribution for P,- P 2 


FOR UR6£ SAMPLES. P, - P., 
15 APPROXlMATELy 
NORMALLy PlSTRIBUTEP, 
MUCH A5 IN THE USE OF 
ONLy ONE SAMPLE. WE UN 
MAKE THE USUAL Z- 
TRAN5FORM TO SET A 
STANPARP NORMAL RANPOM 
VARIABLE UPPROXlMATELy; 

P,-P*-<PrPz> 

cr(P,-P 2 ) 

BUT HOW PO WE FINP 
THAT STANPARP PEVIATlON 



IN THE PENOMINATOR? 



SINCE THE TWO SAMPLES ARE INPEPENPENT, SO ARE THE RANPOM VARIABLES 
P, ANP P 2 , ANP THE TWO VARIANCES APP. 


c(p,-p.) - 


I RKoMMSNP 
AM ASPIRIN To 
6£T THROUGH 
^ THiS'» .. y 


ANP NOW, KNOWINS 
THE PISTRIBUTION 
OF THE TEST 
STATISTICS, WE UN 
PROCEEP TO 
ESTIMATE 
COHFWGNCG 
tHTGRVAL* ANP 
TGST TUG 
HyPOTHC*!* THAT 
ASPIRIN REPUCES 
HEART ATTACKS- 






Confidence 
Intervals for 

P1-P2 


AS USUAL THE CONFlPENCE INTERVALS 
FOR OUR ESTIMATE LOOK LIKE THIS: 


Pi 


~P Z » ^ 1 ' Pi ± p%) 


r \ 


TRUE 
PiFFFREN^C 
OF POPULATION 
PROPORTION* 


\ 


OB5ERVGP 

PlFF£REN££ 


5TANPARP 

ERROR 


CRITI^L 

VALUE 


THE VARIANCES OF P, ANP P 4 APP, SO 
THE STANPARP ERROR BECOMES 



IN THE ASPIRIN STUPy, THE STANPARP 
ERROR IS 


V 


(.oin)C976^) t (qiU)C$97^) 

It ,0*4 II, 0^7 




TO 6ET THE 95* CONFIPENCE 
INTERVAL FOR THE ASPIRIN 
STUPV. WE JUST PLU6 IN THE 
OBSERVEP VALUES: 


p.'Pi * .0091±(V96)(.00179) 

~.009\±'00M 



WE ARE AT LEAST 95* 

CONFIPENT THAT THE 
PlFFERENCE IN HEART ATTACK 


RATES IS BETWEEN .0057 ANP 


. 0125 . PEFlNlTELy A POSITIVE 
NUMBER? WE ARE NOW AT 
LEAST 95* CONFIPENT THAT 
ASPIRIN REALLy POES LOWER 
HEART ATTACK RATES- 





hypothesis 

testing 


THE FORMAL HyPOTHGSlS-TESTlNd, 
QUESTION IS 



H 0 , THE NULL HyPOTHESIS. IS TMAT 
ASPIRIN HAP NO EFFECT p x ~ p r 


H a , THE ALTERNATIVE, IS THAT 
ASPIRIN POES REPUCE THE HEART 
ATTACK RATE; p,> p x - 


NOW WE NEEP A TEST STATISTIC WITH 
A NORMAL PlSTRlBUTlON WHEN H 0 IS 
TRUE... 



WHEN THE NULL HyPOTHESIS IS 
TRUE. THE STANPARP ERROR 
PEPENPS ONLy ON THIS POOLEP 
ESTIMATE 

se o(PrP 2 ) 

ANP WE CAN WRITE A TEST 
STATISTIC: 

Z * $£o(P,-P 2 ) 

(THE NUMERATOR WOULP 
ORPlNARlLy BE P t —P 2 — 

BUT H o ASSUMES p~Pz * 



FOR THE ASPIRIN STUPV WE FlNP 


NOTE THAT UNPER H<„ THE TWO 
PROPORTIONS ARE THE SAME, 
p x =• p 2 - p.~ SO LET'S POOL ALL THE 
PATA TO 6ET THE PROPORTION OF 
HEART ATTACKS IN BOTH 4AMPLC4 
T06CTHCR; 

P /7, -f n 2 


p 


370 

22,071 


SE o(P,-fy - 00175 


SO 


.0091 

00175 


5.20 
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Z OB5 15 MORE THAW fil^ iTANPARP PEVIATION* FROM ZERO. A 5TRON6 
POSITIVE EFFEOT U5IN6 A TABLE OR A COMPUTER. WE FINP THE P-VALUE; 


P-VALUE = ?R(Z2 lot*) - PRfZ 3* 5 . 2 ; * .oooooo\ 



o fi. 


IF THE NULL HYPOTHE5I5 WERE TRUE, THE PROBABILITY OF OB5ERVIN6 AN 
EFFECT THI5 LAR6E 15 ONE IN TEN MILLION-N£RY 5TRON6- EVIPENZE 
A6-AIN5T H 0 1/.' 

V_ 


The 
general 
recipe: 

TO TE5T THE NULL HYPOTHEC 
Wo P\ ' ^2 

COMPUTE THE TE5T 5TATI5TI C 

WHERE 5Eo 15 ZOMPUTEP U5IN6- 
THE POOLEP PROBABILITY 
OBTAINEP BY ZOMBININ& BOTH 
6ROUP57 





THE RELEVANT P-VALUE PEPENP5 ON 
THE ALTERNATE HYPOTHE5I5: 

A; TWO-5IPEP H a : ^ * /? 



P-VALUE = Pr( IZI >IZ OB *i; 
b; RI6HTH* = /> t >/> 2 



P-VALUE - PrC Z > Z OB ^) 

« left H a p < p 2 



P-VALUE - Prf Z < Zoe^ 
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THE ANALySlS OF THE ASPIRIN STUpy PEPENPEP ON CERTAIN FEATURES OF THE 
EXPERIMENT PESI6-NEP TO ENSURE RANPCMNESS ANP TO ELIMINATE BIAS; 




POINTS 1 ANP 2 ARE ESSENTIAL 
PARTS OF MOST HUMAN CLINICAL 
TRIAL PESI6-NS, BUT POINT 3 IS 
NOT ESSENTIAL 600P SMALL- 
SAMPLE TESTS PO EXIST ANP 
ARE AVAILABLE IN COMPUTER 
SOFTWARE PACKAGES. THESE 
HOHPARAMCTRIC PROCEPURES 
PEPENP ON SIMPLE. BUT 
LENGTH* PROBABlLfTy 
CALCULATIONS SIMILAR TO THE 
6 AMBLIN6 COMPUTATIONS WE 
ENCOUNTEREP IN CHAPTER A... 
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Comparing the 

MEANS off two populations 


SUPPOSE WE WANTEP TO COMPARE THE 
AVERAGE SALARY OF WALE AMP FEMALE 
EMPLOYEES IM THE SAME JOB AT SOME 
COMPANY 





POPULATION ONE HAS MEAN POPULATION TWO HAS MEAN 

SALARY JU X ANP STANPARP SALARY ANP STANPARP 

PEVIATION <7\ PEVIATION <r 2 


A RANPOM SAMPLE OF SIZE n y FROM 6ROUP 1 ANP /7 2 FROM 6ROUP 2 6IVES 
SAMPLE MEANS OF ANP ANP STANPARP PEVIATIONS S, ANP S,, 
RESPECTIVELY. THE ESTIMATOR OF McMz » 

AW* 


166 






HOW 600P AN ESTIMATOR IS X,-X t ? 
FOR LAR6-E SAMPLE SIZES, ITS 
APPROXIMATELY NORMAL (VY THE 
CENTRAL LIMIT THEOREMS ANP ITS 
*TANPARP ERROR IS 


&(X r X 2 ) 


V n, n v 


(THE VARIAN CES APP, *IN£E 
*AMPLE* ARE INPEPENPENT.) NOW 
WE CAU PROCEEP PIRE£TLy TO; 

confidence 

intervals: FOR 

LAR6E SAMPLE SIZES. THE (1-a)lOO*> 
COHFIVEME INTERVAL FOR THE 
PIFFEREN^E BETWEEN MEAN* IS 

M'Mz ~ ^i -^2 ± ^ 




hypothesis testing: WE A**E** 

THE NULL HyPOTHE*!* THAT THE TWO POPULATION MEAN* ARE EQUAL 



ANP THE P-VALUE* WORK IN 
THE U*UAL WAV 



and how about comparing 

SMALL SAMPLE 

MEANS? 


REMEMBER CHAMELEON MOTORS? THEIR COMPETITOR, IEUANA AUTO, CLAIMS 
THAT ITS STYROFOAM HOOP ORNAMENT &IVES BETTER FRONT ENP CRASH 
PROTECTION, ANP THEY’VE CRASHEP SEVEN IEUANAS TO PROVE IT/ 




V. 


THEIR RESULTS. COMPARER WITH CHAMELEON'S- 


CHAMELEON 


1 

Si 50 

2 

Saoo 


$7 20 

4 

$500 

5 

$ 9 T 0 


5 


$540 


$299 | 


I6UANA 


1 

$50 

2 

$200 

? 

$150 

4 

$400 

5 

$750 

6 

$400 

7 

$150 

"2 

7 

*2 

$500 

*2 

$250 
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THE t PISTRIBUTION CAN BE USEP 
IF BOTH POPULATIONS ARE MOUNP 
SHAPEP ANP HAVE THE SAME 
STANPARP PEVIATION a--a } *-cr T 
THE ONLY WRINKLE IS THAT WE 
HAVE TO POOL THE SUM OF 
SQUARES ABOUT THE MEANS TO 
FORM A SINGLE ESTIMATE OF o". 

2- _ frrf)*?' -T 4^ 

?00L YV,-t-yi x -2 



THE STANPARP ERROR IS THE SAME AS 
FOR LARGE SAMPLES, EXCEPT THAT 
Sp^ REPLACES S, ANP S 2 ; 


X X ) = AI . Ipoou 


*" ^PoocN •%**" ru, 

THE (1-<*>1P0% CONFlPENCE 
INTERVAL IS 

'J^2 — | — ^2 “ X X<^) 

WHERE to IS A CRITICAL VALUE OF t 
WITH /7,-/7,—2 PEGREES OF FREEPOM 


THE REPTILIAN CARMAKERS AGREE THAT THEIR STANPARP PEVlATIONS ARE 
CLOSE ANP THEIR REPAIR HISTOS-RAMS ARE MOUNP-SHAPEP THEY COMPUTE- 


2 bA 


1 4-299* + A ?2S J 

^POOU * ^ 1& 

5&CX,-Xj) — -194 

THE 99% CONFlPENCE INTERVAL IS 

M~Mz - 540-300 ±t en9 ( 194) 

= 24P ± f2.23X154; 

* 240 ± MO 


SINCE THIS INCLUPES THE VALUE O, 
IGUANA AUTOS HAS MOT SHOWN A 
SIGNIFICANT IMPROVEMENT IN 
REPAIR COSTS. 


ovc. foR6£T safety-- 
gUT you CAM'T Afc60€ 
JNlTH ggAOTlFUL ST//H6 
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HERE’* AN EXAMPLE THAT *HOW* THE 
PITFALL* OF MINPLE**Ly FOLLOWING 
THE COOKBOOK: A LAR6E TAXI FLEET 
OWNER WANT* TO COMPARE TUE 6A* 
MILEAGE U*IN6 <£4* A ANP 6A* 9. 



*TARTIN6- WITH WO £A6*. HE RANPOMLy A**l6N* 50 TO EA£H 6A*OLINE. ANP, 
AFTER A PAy’* PRIVIN6-, PETERMINE* 





5AMPLE 

MEAN 

5TANPARP 


*/z£ 

MILEAGE 

PEVIATION 

A 

50 

25 

5.00 

B 

50 

2* 

AX>0 
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PAIRED COMPARISONS 

a better way to compare gasolines 


V Go 

.vmoHk' 5 / 


wuePE'n the taxi owner followed the 

- COOKBOOK EXACTLY HIS SAMPLES 

-—-vO WERE RANPOM, ANP HIS SAMPLE 

-*Sp SIZE WAS LAR6E ENOUGH HE 

JUST FAILEP TO THINK WHEN 
VI NEZESSARy' 


ALTHOUGH 6-AS B APPEARS TO BE SLl6HTLy BETTER THAN 6AS A, THE 
ZONFiPENZE INTERVAL WAS WIPE BEZAUSE OF THE LAR6E STANPARP 
PEVIATIONS-I E.. THE MILEAGES VARIEP WIRELY FROM ONE CAB TO THE 
NEXT. WHy SUZH HI6H VARlABILITy? BEZAUSE ZABS-ANP ZABBlES—HAVE 
PIFFERENT PERSONALITIES? 




A FAR BETTER WAV TO PO TNI* *TUPY I* TO A**I6N 6A* A ANP &A* B TO THE 
C.4F ON PIFFERENT PAY* 



f - 



-N 

WE *TlLL RANPOMIZE THE TREATMENT By FLIPPING A LOIN TO PECIPE 

WHETHER TO U*E 6-A* A ON TUE5DAY OR WEPNE*PAY WE CAN AL*0 CUT THE 
EXPERIMENT POWN TO IP CAB*. *AVIN& THE OWNER A LOT OF TIME ANP 

MONEY/ 




fms CM 

[ cows ) 

\To To6>£! 7 1 

6A* A 

6A* B 

DIFFERENCE 

27.P1 

2695 

P.P6 

S —- , 2 

10.00 

2P.44 

— P44 


1341 

25-P5 

- 1 64 

^1 yy 4 

25.22 

lb-31 

- MP 

5 

30.11 

2956 

0.55 

^ \ A 

25.55 

26.6P 

- 1-05 

J 7 

22.2? 

22.9? 

-0.70 

0 

1970 

10.13 

-0.45 

TIT 9 

??-45 

3395 

-0.50 

ip 

25.22 

lb.01 

-0 79 

MEAN 

25.2P 

1580 

— O.bO 

*TANPARP PEVlATION 

427 

4.10 

Obi 

NOTE THAT THE MEAN* ANP *TANPARP PEVlATION* OF 6-A* A ANP 6A* B ARE 
ABOUT THE *AME- THAT'* TO BE EXPECTED *INC£ THEY HAVE THE *AME *OURZE 
OF VARIABILITY A* IN THE UNPAlREP EXPERIMENT. BUT NOW THE PIFFERENCE 
COLUMN HA* A VERY tMALL *TANPARP PEVlATION. THE PIFFERENCE COLUMN, 

BY COMPARING 6A* PERFORMANCE WITHIN A *IN6LE CAR, ELIMINATE* 

^ VARIABILITY BETWEEN TAXI*. 



_/ 


IT* 





THE PlFFEREN^ES d t PROViPE A 
5IN6LE MEASURE OF 
PlFFEREME FOR EACH TAXI, 
ANP WE cm USE IT TO MAKE A 
SMALL-SAMPLE t TEST STATISTIC 


f - --- 



SO WE HAVE -t.P4 < Md - ' 16 WITH 99% COUFlPFUCQ, EOO P EVIPEN<:E THAT 
6-AS B REALLy IS BETTER. 

/ ■ 

THE HyPOTHESfS-TESTlNO R VALUE £AN BE FOUWP USILI6 A SOFTWARE 

PAZKAS-E 

O 

RVALUE = Pr(ltl>lt 0i J) 

- Pr(\t\ >£ ) 

* Pr(ltl> 3.15) 

® .012 < .05 

A6-AIM, 6AS B PASSES THE TEST. 

V___ J 



n 6 






f HERE ARE PLOTS OF THE 6AS MILEA6E PATA- THE FIRST ONE SHOWS THE 
MILEAGES UNPAfREP: 

6 >AS B • • • • ••• • • • 

6AS A • • ••••• • • 

20 22 24 24 20 30 32 94 

MILES PER FALLOW 

AMP HERE* THE SAME PATA PAIREP QY TAXICAB 



MILES PER SALLOW 

L_-/ 



THE PREPOMIMAM^E OF RISHT 
LEAMIM6 LIMES IS A STROMS 
HIMT THAT 6AS B 61VES 
BETTER Ml LEASE ** 


vJhat x 
R lWT-L6^IH6* 
LINE^? > 
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A PAIRED COMPARfDON EXPERIMENT ID ONE OF THE MODT EFFECTIVE WAYD TO 
REPUTE NATURAL VARIABILITY WHILE COMPARING TREATMENT^. FOR EXAMPLE, IN 
COMPARING HAND CREAMD, THE TWO BRANDS ARE RANDOMLY ADDlONEP TO 
EACH DUBJECTD RI6HT OR LEFT HAND*. THlD ELIMINATED VARIABILITY DUE TO 
DKIN DIFFERENCED. 



OR, IN COMPARING TWO BREAKFADT CEREALD, EACH TADTER RATED BOTH 
CEREALD ON RANDOM ORDER). THE PAIRED COMPARlDON REMOVED THE 
NATURAL BIAD OF THE TADTER FOR OR A&AINDT CEREAL IN GENERAL 
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IN THIS CHAPTER. WE APPUEP THE 
BASIC I PEAS ABOUT CONFlPENCE 
INTERVAL-5 ANP HYPOTHESIS 
TESTING TO THE COMPARISON OF 
TWO POPULATIONS. THERE ARE 
INNUMERABLE FURTHER POSSI¬ 
BILITIES WE COULP HAVE 60NE ON 
TO PESCRIBE COMPARISONS OF 

0 THE STANPARP 

PEVIATIONS OF TWO 
POPULATIONS WHEN 
SAMPLE SIZE IS 
SMALL > 

0 THE MEANS OF 
MORE THAN TWO 
POPULATIONS WHEN 
SAMPLE SIZE IS 
LAR6E, 

0 THE MEANS OF 
MORE THAN TWO 
POPULATIONS WHEN 
SAMPLE SIZE IS 


IN PRACTICE, STATISTICIANS PETERMINE THE GENERAL NATURE OF THE 
PROBLEM, ANP THEN CONSULT THE RI6-HT REFERENCE BOOK 



THE ONLY THIN6 REALLY NEW 
IN THE CHAPTER WAS THE IPEA 
OF THE FAIRGP COMPARliOH 
TG*T. IN THE NEXT CHAPTER, 
WE LL LOOK AT SOME OTHER 
KINPS OF EXPERIMENTAL 
PESI6NS- 





♦ Chapter 104 

EXPERIMENTAL 

DESIGN 


THE PESI£N OF AN EXPERIMENT OFTEN SPELL* SUCCESS OR FAILURE 
IN THE PAIREP COMPARISONS EXAMPLE, OUR STATISTICIAN CHAN6-EP 
ROLES FROM PASSIVE NUMBER CATHERINE ANP ANALVSIS TO ACTIVE 
PARTICIPATION IN THE PESl&N OF THE EXPERIMENT 




tftl 


IN THIS CHAPTER, WE 
INTRODUCE THE BASf£ 
I PEAS OF EXPERI¬ 
MENTAL PESI&N, 
WHILE LEAVING THE 
PETAILEP NUMERICAL 
ANALySlS TO YOUR 
HANPy STATISTICAL 
SOFTWARE PA/K 



UO 

Tw^ CMftee .. 



m 





TOPAy, EXPERIMENTAL PESlGN IPEAS 
ARE USEP EXTENSlVELy IN INDUSTRIAL 
PROCESS OPTIMIZATION, MEPICINE 
A KIP SOC7AA SCIENCE. ANy EXPERI¬ 
MENTAL PES/GN USES THREE BASIC 
PRINCIPLES, WHICH ARE CLEARLy 
ILLUSTRATEP IN OUR CAB EXAMPLE 



Local control refers 


Replication: the same 

TREATMENTS ARE ASSIGN EP TO 
PlFFERENT EXPERIMENTAL UNITS. 
WITHOUT REPLICATION, IT S 
IMPOSSIBLE TO ASSESS NATURAL 
VARlABlLITy ANP MEASUREMENT 
ERROR 



Randomization: 


TO ANy methop that accounts for 

ANP REPUCES NATURAL VARlABlLITy. 
ONE WAy IS TO GROUP SIMILAR 
EXPERIMENTAL UNITS INTO BLOCKS. 
IN THE CAB EXAMPLE, BOTH GASO¬ 
LINES WERE USEP IN EACH CAR, ANP 
WE SAy THAT THE CAB IS A BLOCK. 



THE ESSENTIAL STEP IN ALL 
STATISTICS/ TREATMENTS MUST BE 
ASSIGNEP RANPOMLy TO EXPERI¬ 
MENTAL UNITS- FOR EACH TAXI, WE 
ASSIGNEP GAS A TO TUESPAy OR 
WEPNESPAy By FLIPPING A COIN. IF 
WE HAPNT, THE RESULTS COULP HAVE 
BEEN RUINEP By PlFFERENCES 
BETWEEN TUESPAy ANP WEPNESPAy? 







NOW SUPPOSE WE WANT TO INVESTIGATE THE EFFECT OF TWO BRAN PS OF 
TIRES AS WELL AS TWO GASOLINES WE HAVE FOUR POSSIBLE TREATMENTS, 
WHICH WE CAN LAy OUT IN A TWO-BV-TWO FACTORIAL PESlGN THE TWO 
FACTORS ARE GAS ANP TIRE MAKE 



GAS A 

GAS B 

TIRE A 

a 

b 

TIRE B 

C 

d 



WE CAN ASSIGN THE FOUR TREATMENTS AT RANPOM TO FOUR PlFFERENT PAYS 
FOR EACH CAB ALL FOUR TREATMENTS (a, b, C, ANP d) ARE REPEATEP 
WITHIN EACH BLOCK (CM). THIS IS CALLEP A COMPLETE RAHPOMIZEP BLOCK 
PESlGN. 


SO FAR, WE HAVE 
ASSUMEP THAT EVERy 
PAy OF THE WEEK IS 
THE SAME, BUT WE CAN 
CONTROL FOR THIS, 
TOO, IN THE 
FOLLOWING WAy= USE 
ONLy FOUR CAB*, ANP 
ASSIGN THE 
TREATMENT ACCORPING 
TO THE TABLE AT 
RIGHT: 



PAY 

| 

12 3 4 

CAB 1 

abed 

2 

b c d a 

* 

c d a b 

A 

d a b c 
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A FOUR BY-FOUR TABLE 
WITH FOUR PlFFERENT 
ELEMENTS. EA£M APPEARING 
ONCE IN EVERY COLUMN 
ANP ROW, 15 CALLEP A 

Latin square. 

IN TMIS EXPERIMENT, TME 
FOUR PAYS ANP FOUR CABS 
6ET ALL FOUR TREATMENTS 
EXACTLY ONCE- 




TWE RANPOMIZATlON STEP 
PICKS A SIN6LE LATIN SQUARE 
PESI6N AT RANPOM FROM A 
LIST OF ALL POSSIBLE FOUR 
WAY LATIN SQUARES. 


IF FOUR UNITS ISN'T ENOU&M, WE CAN INCREASE TME NUMBER OF 
EXPERIMENTAL UNITS BY REPEATING TME EXPERIMENTAL PESI6-N STARTING 
WITM EI6-MT CABS, WE COULP PIVIPE TMEM INTO TWO 6-ROUPS OF FOUR ANP 
TMEN REPEAT TME PESI6N WITMIN EACM GROUP 







- 

WE PROMISEP NOT TO GO INTO TME PATA ANALySlS IKI ANY PETAIL, BUT MERE 
IS ROUGMLY MOW A COMPLEX PESlGN LIKE TMlS 15 MANPLEP. 



EXPERIMENTAL PESlGNS ARE ANALVZEP By ALLOCATING TOTAL VARIABILlTy 
AMONG PIFFERENT SOURCES- IN TME CAB EXAMPLE, TME SOURCES OF 
VARIABILlTy ARE TME CAB, TME TIRE MAKE, GAS TYPE, PAy-ANP RANPOM 
ERROR. ANALySlS OF VARIANCE, ANOVA FOR 5MORT, PARTITIONS TME TOTAL 
VARIATION, ALLOCATING PORTIONS TO EACM SOURCE. 


IN TME NEXT CMAPTER, WE EXPLAIN IN 
PETAIL ONE MOPEL FOR ANALyziNG 
COMPLEX PESIGNS: TME LINEAR 
REGRESSION MOPEL. IN LINEAR 
REGRESSION, yOU LL BE ABLE TO SEE 
ANOVA UP CLOSE ANP NUMERICAL- 
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♦ Chapter 1 1 ♦ 

REGRESSION 


SO FAR, WE'VE PONE STATISTIC ON OA/E - VARIABLE AT A TIME WHETHER IT 
£AME FROM A POPULATION OF PILL TAKERS, PICKLE*. OR CRASHEP CARS. IN 
THIS CHAPTER, WE'LL SEE HOW TO RELATE TWO VARIABLE* GIVEN THE 
WEIGHT* OF THE 92 STUPENTS IN CHAPTER 2, WE ASK HOW THEY ARE RELATEP 
TO THE STUPENTS 1 HEIGHT*. 



THIS IS AN EXAMPLE OF A BROAP CLASS OF IMPORTANT QUESTIONS POES 
BLOOP PRE**URE LEVEL PREPlCT LIFE EXPECTANCY? PO *A.T. *CORE* 
PREPICT COLLEGE PERFORMANCE? POES REAPING STATISTICS BOOKS MAKE 
yOU A BETTER PER*ON? 
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IN MATH CLASS, YOU PROBABLY 
LEARNEP TO SEE RELATIONSHIPS 
PISPLAYEP AS 6RAPH5 6IVEN 
yOU CAN PREP/CT V. 

BUT IN STATISTICS. THINGS 
ARE NEVER SO CLEAN' WE 
KNOW (OR SUPPOSE WE 
KNOW; THAT HEIGHT HAS 
AN INFLUENCE ON WEI6-HT- 
BUT ITS NOT THE $OLE 
INFLUENCE. THERE ARE 
OTHER FACTORS, TOO, LIKE 
SEX. A6E, BOPy TYPE, ANP 
RAHPOM VARIATION. 





FOR THIS CHAPTER, LET'S LABEL THE WEIGHT PATA AS y ANP THE HEIGHT PATA 
AS X THUS (X,. y.,) ** THE HEIGHT ANP WEIGHT OF STUPENT i. WE PISPLAY 
THE POINTS (X it %,) IN A 2-PIMENSlONAL POT PLOT CALLEP A KATTERPLOT. 



(SOME OF THE POTS ARE B/C^ER, BECAUSE THEY REPRESENT TWO OR THREE 
STUPENTS WITH THE SAME HEI6-HT ANP WEIGHT.; 


lee 







£AM WE PR9PICT A STUPEMTS WEIGHT y FROM MIS OR MER MEI6MT 

Regression analysis 


FITS A STRAIGHT LIME TO 
TMlS MESSy S£ATTERPLOT. 

^ IS OVLLEP THE 
INPEPENPENT OR 
PREPiCTOR VARIABLE, AMP 
^ IS THE PEPENPENT OR 
RESPONSE VARIABLE. TME 
REGRESSION OR PREPICTION 
LIME MAS TME FORM 

^ =• a.\b% 



TO ILLUSTRATE TME FITTING PROCESS, LETS USE A SMALLER, RI6£EP PATA SET 
WITM OMLy NINE STUPEMT MEI&MT-WEl&MT PAIRS 


HEIGHT 

WClfrUT 


250 - 

bo 

64 


bl 

99 



M 

MO 



bb 

199 


200 - 

be 

119 



70 

179 

f— 


71 

M9 

O) 


7A 

197 

Q) 

5 

150 - 

7b 

150 


100 - 




50 - 


60 


X 

i 


O 


O 

o -. 4 - 


—I— 

65 


—i— 

70 


height 


MOW MOW PO WE GET TME BEST-FITTIM& LIME? 


O _ 


75 


iey 







THE I PEA 15 TO MINIMIZE 
THE TOTAL SPREAP OF THE 
y VALUE* FROM THE LINE 
JUST AS WHEN WE PEFINEP 
THE VARIANCE, WE LOOK - AT 
ALL THE SQUARED y_ 
DISTANCES FROM THE LINE, 
ANP APP THEM UP TO GET 
THE SUM OF SQUARED 
ERRORS 

n 




- *uaA of THFSE SQoLReS 


IT'S AN AGGREGATE MEASURE OF HOW MUCH THE LINES "PREPICTEP 
OR p,, PIFFER FROM THE ACTUAL PATA VALUES y. t . 


/ The regression or 

least squares line 

-Wrs? 15 THE UWE MTH ™ #A4 “« r “ e - 



'SHALL WE JU5r 
MSLSURE IT FOR 
EVER/ LIME? > 


HISTORICAL NOTE- WHY PO WE CALL THIS PROCEPURE REGRESSION 
ANALYSIS? AROUNP THE TURN OF THE CENTURY, GENETICIST FRANCIS 
GALTON PISCOVEREP A PHENOMENON CALLEP REGRESSION TOWARD 
THE MEAN. SEEKING LAWS OF INHERITANCE, HE FOUNP THAT SONS' 
HEIGHTS TENPEP TO REGRESS TOWARP THE MEAN HEIGHT OF THE 
POPULATION, COMPARED TO THEIR FATHERS' HEIGHTS TALL FATHERS 
TENPEP TO HAVE SOMEWHAT SHORTER SONS, ANP VICE VERSA. GALTON 
DEVELOPED REGRESSION ANALYSIS TO STUPY THIS EFFECT, WHICH HE 
OPTIMISTICALLY REFERREP TO AS "REGRESSION TOWARP MEPIOCRITY " 


6*0W UP, 
vBoy» y 







you ^MT-gs 
TAKE THE 

* vector y-y ^ 

AHP PROJECT *T ' 

onto The r vectofl. 

AM>.. 


MOT TO BEAT AROUMP THE BUSH, WE 
OIVE WITHOUT PROOF THE REGRESSION 
LIME’S FORMULA: IT* MESSy BUT 
COMPUTABLE. 


^ — a+b% 


WHERE 


Z'toi-xlfyt-'u) 




a — y.~bx 


("HERE * AMP ^ ARE THE MEAMS OF 
{*,} AMP {y,} RESPECTlVELyj 


BECAUSE SOME OF THESE EXPRESSIONS WILL SHOW UP A6AJN. WE ABBREVIATE 
THEM 


< = / 

**yy ^ Zfy'P 1 


5UM OF SQUARES AROUMP 
THE MEAM. THESE MEASURE 
THE SPREAP OF X, AMP y t . 


ss^ = V THE PROpU ^ PETERM,NC * 

* * * {WITH &x% } THE COEFFICIENT A 
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FOR THE RI66EP PAT A., MERE'5 THE WHOLE COMPUTATION 



Vi 


<Vr& 

(*<-*) 2 

<*<-*>* 

(Zi-zltyrp 

6o 

04 

-0 

-56 

64 

5156 

440 

62 

95 

-6 

-45 

56 

2025 

270 

64 

140 

-4 

0 

16 

0 

0 

66 

155 

-2 

15 

4 

225 

-50 

60 

119 

0 

-21 

0 

441 

0 

70 

175 

2 

55 

4 

1225 

70 

72 

145 

4 

5 

16 

25 

20 

74 

197 

6 

57 

56 

5249 

542 

76 

150 

0 

10 

64 

100 

00 

6UM-612 

/£*60 

1260 

£*140 


**» 

*240 5^ 

* 10426 

55,,. *1200 
zy 


WHICH 6IVE5 VALUED OF a ANP k 
, 1200 _ . _ 

£>- a - y-bz * 140-5(40) - -200 

60 y_ =• -200 + f?;C 
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ANOVA 

(AS PROMISE!?, OR THREATENEPI.) 
MOW WE ASK: |F THIS IS THE 
BEST FIT, HOW OOOP IS IT? 





'IH T&HWOJ, 
TERMS, HOW 
PAP IS TUE 
k £lop? y 


MISSEp 1 



AS yOU £AM IMAGINE, THE AMSWER TO THIS QUESTION PEPENPS ON HOW 
SLOPPILy THE PATA POINTS ARE SPREAP OUT, I E-, HOW BIO SSE" IS, RELATIVE 
TO THE TOTAL SPREAP OF THE PATA SOME EXAMPLES: 



&OOV FIT MOPERATE 
SSE. BUT LAR6E 
TOTAL SPREAP 
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LET'S QUANTIFY THIS BY 



•Vi ^ 

APPORTIONING THE VARIABILITY 


error — 7 X 

IN V. REFER TO THE PICTURE AT 




RIGHT FOR GUlPANCE WE LET 



A v« 





THUS, y, ARE THE PREPIOTEP 

V' 



WEIGHTS PETERMINEP BY THE 


X 


REGRESSION LINE- 

/ 



ANOVA table 





SOURCE OF VARIABILITY SUM OF SQUARES VALUE FOR RlGGEP PATA 


REGRESSION 


ERROR 


TOTAL 


ssr ^ ^y,(Q,-y ') 2 t,ooo 

I*/ 

**• £,<*■-gr ** u 






10,424 


CBY THE WAY. IT IS NOT OBVIOUS THAT SS^ = SSR + SSE-BUT IT'S TRUE?; 
ANYWAY, HERE IS HOW THE REGRESSION AWP ERROR SUMS OF SQUARES ARE 
^AUULATEP FOR THE RIGGEP PATA SET, WITH y = -200+ S*. 

REGRESSION ERROR 



Vi 

V‘ 

<p-p 


<v.-p> 


60 

04 

100 

-40 

1500 

-15 

255 

62 

95 

110 

-30 

900 

-15 

225 

64 

140 

120 

-20 

400 

20 

400 

66 

155 

130 

-io 

100 

25 

52? 

60 

119 

140 

0 

0 

-21 

441 

70 

175 

150 

10 

100 

25 

525 

72 

145 

150 

20 

400 

-15 

225 

74 

197 

170 

30 

900 

27 

729 

76 

150 

100 

40 

1600 

-30 

900 

V*l 

M 

£ 

if-140 



SSR-5000 


SSE » 4425 
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66R MEA5DRE5 THE TOTAL 
VARIABILITY PUE TO THE 
RE64?E55lON, IE, THE 
PREPI4TEP VALUED OF y. 

WE'VE ALREAPY MET. 
NOTE THAT 


15 THE PROPORTION OF 
ERROR, RELATIVE TO 
THE TOTAL 5PREAP. 



A NUMERICAL^ 
EKPRt^SlOM 
FOR THE "SLOP ' 7 






[The squared correlation 

15 THE PROPORTION OF THE TOTAL 55 yy 
A^OUNTEP FOR BY THE RE6RE55ION; " 

b’m 

R* * 4r- * » - r* 6 



--—-- 

fBE4AU5E 55R * 55 yy -55E;. R 2 15 

0 

ii 

• 

| 

ALWAY5 LE55 THAN 1. THE 4L05ER IT 

15 TO 1, THE TIGHTER THE FIT OF 

THE 4URVE R 2 » 1 40RRE5P0NP5 

• 

1 TO PERFECT FIT. 1 


OVUUUTIN6- R 2 FOR THE 
RI6^EP PATA 5ET, WE 6-ET 


R*. J*22. 

10,424 


= .?0 


50* OF THE VARIATION IN 
WEIGHT 15 EXPLAINEP BY 
HEIGHT. THE OTHER 42% 
15 ’ERROR ’ 




ALTERNATELY. THE 


correlation 

coefficient 

IS THE SQUARE ROOT OF R 2 WITH 
THE SlSN OF b. 

V = (SI6-N OF b) iW 


f N&6ATIVE r ^ 

means that * is 
NBtAT/VELy , 
related* to 


THUS, r IS + IF THE LINE SOES 
UP TO THE RlSHT ANP - IF IT 
60ES POWN TO THE RI6-HT. 


'=-0.9 


r^o 


• „ • 


r-^ori 


• • 


• « 






NOW LET'* be 
HONEST; NOBOPy- 
WELL, ALMofT 
NOBOPy-POES 
THESE CALCULATIONS 
By HANP ANyMORE 
WITH A COMPUTER, 
ALL THIS WORK CAN 
BE PONE IN one 

Line of cope... 



USIN& THE MiniTAB STATISTICAL SOFTWARE SySTEM, PEVELOPEP AT PENN 
STATE, THE SINGLE CQMMANP LOOKS LIKE THIS: 

11TB > regress 'weight* on 1 independent oariable ‘height' 
ANP THE RESULTS ARE 
The regression equation is 
UEIGHT « - 200 + 5.00 height 


Predictor Coef 
Constant -200.0 
height 5.000 


s * 25.15 


Stdeu t-ratio p 
110.7 -1.81 0.111 

1.623 3.08 0.018 


R-sq * 57.5X R-sq(adj) = 51.5* 
Rnalysis of Uariance 


SOURCE 
Regress I on 
Err or 
Total 


DF 

1 

7 

8 


SS 

6000.0 

1126.0 

10126.0 


ns 

6000.0 

632.3 


F 

9.19 









MOW LET* VO IT TO THE R£AL 
PATA OF 92 5TUPENT5: 


nTB > regress 'weight' on 1 independent variable 'height' 

AMP THE RE5ULT5 

The regression equation is 
HEIGHT - - 205 + 5.09 HEIGHT 


Predict or 

Coe f 

St dev 

t-rat 1o p 

Constant 

-201.71 

29.16 

-7.02 0.000 

height 

5.0918 

0.1237 

12.02 0.000 

s - 11.79 

R-sq - 

61.6* R- 

sq(adj) “ 61. 2X 


Rnalysis of 

Uar i 

i ance 




SOURCE 

OF 

SS 

ns 

F 

P 

Regression 

1 

31592 

31592 

111.38 

0.000 

Error 

90 

19692 

219 



Total 

91 

51281 





HERE 15 THE 
5CATTERPLC>T WITH 
THE FITTEP 
RE&RE55ION LIME. 

THE CORRELATION 
COEFFICIENT FOR THI5 
PATA 5ET 15 

r - -.76 




height 
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STATISTICAL 

INFERENCE 


'WHAT 
VWfiENED 
.to you' 


UP TO NOW, WE HAVE BEEN 
PC*NO PATA ANALY*!*. 
PESCRIBIN& THE NEAREST LINEAR 
RELATIONSHIP BETWEEN THE 
OBSERVEP PATA X ANP y. NOW 
LET S SHIFT OUR POINT OF VIEW, 
ANP RE6ARP THE 92 STUPENTS 
AS A SAMPLE OF THE 
POPULATION OF STUPENTS AT 
LAROE WHAT CAN WE INFER? 


Got ' 


A RE6RE**lON MOPEL FOR THE WHOLE POPULATION IS A LINEAR 
RELATIONSHIP 

y =• <*+ px + g 


y IS THE PEPENPENT RAN POM VARIABLE; X IS THE INPEPENPENT VARIABLE 
CWHICH MAy OR MAy NOT BE RAN POM}; a ANP p ARE THE UNKNOWN 
PARAMETERS WE SEEK TO ESTIMATE; ANP e. REPRESENTS RANPOM ERROR 
FLUCTUATIONS. 


f MOTE 6REEK 
LETTERS To IHOifATC 
V MOPEL -POM 1 



FOR THE HEIGHT 
VS. WEIGHT MOPEL, 
y IS WEIGHT, X IS 
HEIGHT, » ANP p 
ARE UNKNOWN, ANP 
yOU CAN THINK OF 
£ AS THE RANPOM 
COMPONENT OF 
THE WEI6HTS V 
FOR EACH VALUE 
OF HEIGHT X. 
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THE PI5TRIBUTION OF 6 15 IN FA CT PlFFCRCNT FOR PIFFERENT VALUE5 OF X 
5-FOOTER5 VARy LE55 IN THEIR WEIGHT THAN A-FO0TER5 NEVERTHELE55. WE 
NOW MAKE A 5lMPLIFyiN£ A55UMPTION- LET‘5 5UPP05E THAT FOR ALL VALUE5 
OF X, THE £ 5 ARE IHPCPCHPCHT, FORMAL, ANP HAVE THE fAMC 6TANPARP 
PCVIATION cr =■ cr(e') ANP MEAN ju. - O 




50— MAyBE THE WEIGHT 
MOPEL MI6HT BE 

y - -1254-4;S4- e 

S 15 NORMAL WITH ju s- O 
ANP tr= 15 POUNP5 CM). 
THEN, AOX>RPING> TO THI5 
MOPEL. 5TUPENT5 WHO ARE 
6'4 m (76 (NdHE5) HAVE THE 
PI5TRIBUTION OF 

y - -125 4-4 ( 76 )+£ 

- 175 4- e 

50, FOR ^ = lb, y 15 NORMAL 
WITH MEAN 175 ANP 5TANPARP 
PEVIATION 15 POUNP5 



■ 2.00 










NOW, GIVEN THE MOPEL y =• Or 4- + £, WE WANT TO PO AS WE'VE PONE 

REPEATEPLy IN THE LAST FEW CHAPTERS TAKE A SAMPLE ANP USE IT TO 
ESTIMATE a ANP /3. 


ONE WW 5HOW THAT THE 
a ANP b WE GOT By THE 
LEAST-SQUARES METHOP 

are BLUE: the Best 
Linear Unbiasep 
Estimators of a anp p 

^WHATEVER THAT MEANS// 




AS USUAL, PIFFERENT SAMPLES yiELP PIFFERENT COLLECTIONS OF PATA, 
WHICH GENERATE PIFFERENT REGRESSION LINES. THESE LINES ARE 
PISTRIBUTEP AROUNP THE LINE y = Oc+ fix + € OUR QUESTION BECOMES: 
HOW ARE « ANP b PISTRIBUTEP AROUNP a ANP fi, RESPECTlVELy, ANP HOW 
PO WE CONSTRUCT CONFIPENCE INTERVAL* ANP TEST HYPOTHESES? 


ZO\ 







FOR EACH PATA POINT Cx t . %,). 
WE HAVE 

y.i : «■ a + hXi^r e, 

WHERE ^ 14 

THE ^ -PITTANCE OF 
FROM THE RE 6 -RESSION 
LINE THE €?, ARE fAMPLC 
VALUC* OF e, AMP THEY 
&IVE US AN ESTIMATOR S 
FOR crfe): 




('WHY n-Z IN THE PENOMlNATOR? BECAUSE WE HAVE USEP UP TWO PE 6 -REES 
OF FREEPOM TO COMPUTE a ANP b, LEAVING n-Z INPEPENPENT PIECES OF 
INFORMATION TO ESTIMATE a.) 



ALTHOUGH IT ISN’T OBVIOUS, 
WE CAN ALSO WRITE S AS 


LEARN Yl PlMEMSiOMAL 
6B0METRV, \ TELL YOU 
ANP IT'S £A4// 


A FORMULA WHICH ALLOWS 
US TO COMPUTE S 
PIRECTLY FROM THE 
SAMPLE STATISTICS. 






confidence intervals 


THE 95% CONFIPENCE INTERVALS 
FOR a AN I? >9 HAVE THAT OLP 
FAMILIAR FORM: 

p-=-b±tjn,i^Cb) 

a ■=■ a ± tjns$B(a) 

WHERE WE USE THE t PISTRIBUTION 
WITH /7-2 PE6REES OF FREEPOM 
CFOR THE SAME REASON AS ABOVE;. 



THE STANPARP ERRORS, HOWEVER, LOOK RATHER UNFAMILIAR. THE/ ARE 

^without perivation;= 


&(b) 

*£(«) = 


Vss 




= ^ 


JL 

ss 





VES-. LOOKS UFE 
TVP 6/AHIPE LACgp 
ALMONP ToRTT 
FROM THE My 4 -dPy 
of rye pet/ft '5 
PENOMiNATOP 


WHAT HAPPENEP TO OUR PREVIOUS ^==? IT WAS REPLACEP By SS**. LIKE /7. 

55., ^ INCREASES AS WE APP MORE PATA POINTS, BUT IT ALSO REFLECTS THE 
TOTAL 4PREAP OF THE X PATA. FOR EXAMPLE, IF AAA iTUPENT* SAMPLEP 
HAP THE 4AME HEIGHT, WE WOULP BE UNJUSTIFIEP IN PRAWIN6 ANY 
CONCLUSION ABOUT THE PEPENPENCE OF WEI6HT ON HE/6HT. IN THAT CASE. 

55., * O. 6IVIN6- b~co ANP INFINITELY WIPE CONFIPENCE INTERVALS. 


A. 

I 

k 


ALL X 
The SA m 



PC 


HE/OH r 


ZP3 



MORE QUESTIONS: 


HOW WELL GAN WE PREDICT 

THE MCAN RC4PON4C y AT 

A FlXEP VALUE % D ? FOR 


250 

INSTANCE, WHAT IS THE 

MEAN WEIGHT OF STUPENTS 

OF HEIGHT 7S INCHES? THE 

JZ 

200 

95% GONFIPENGE INTERVAL 

o> 

CD 

150 

FOR Y - a 4- pZ D IS 



a+fiz 0 - a + />^ ± i^SEf y ) 


too 

WHERE 


50 


5>EC£)= 5 V F + “S- 


SUPPOSE A NEW STUPENT ENROLLS. WHO HAS HEIGHT Z New . HOW WELL £AN 
WE PREPlOT V NEW WITHOUT MEASURING IT? 


THE 95% PREPIOHON INTERVAL 
FOR A NEW INPIVIPUAL y wgw 
WITH OBSERVEP * wcw IS 

^WEW ~ a + ^NCW * ^'£>25' > ^ Y NGW'' 


S' 150 


WHERE 


100 j " o 


5y i + fl- + 


1 . ^KlEW 


60 65 70 75 

height 


BOTH THESE STANPARP ERRORS CONTAIN A TERM 
THAT GROWS LARGER AS THE *-VALUE, OR 

^NEW- 6ETS FARTHER FROM THE MEAN VALUE Z. 
WHy POES THE ERROR INGREASE FARTHER FROM 



• 

""ev^e* 

ERROR 

BECAUSE, IF yOU WIGGLE THE REGRESSION 

LINE, IT MAKES MORE OF A PIFFERENGE FARTHER 


5MAU 

ERROR 

FROM THE MEAN! (REMEMBER, THE LINE ALWAyS 
PASSES THROUGH (*,%)■) 

N__ 


> 



2 £M 





LET'S WORK IT OUT FOR THE 
RI6-6EP PATA: FOR THE MCAN 
WCI6HT WHEN X^76 INCHES. 

WE HAVE b =, -200 ANP a - 5. 

THEN 

y = -200 +5(76) ± (2.365X25.15) 
= 190 ± (2.365X2515) {.3777 

= 190 ± 36.34 POUNPS 


THE ESTIMATEP /MEAN OF 
6'4 m 5TUPENTS IS ISO 
POUNPS, ANP WE RE 95% 
£ONFlPENT THAT WE RE 
WITHIN 3A POUHP5 OF 
THE TRUE MEAN. 





FOR A NEW STUPENT WHO'S 64', WE USE OUR RI66EP SAMPLE OF NINE 
POINTS TO PREPlOT THAT __ 

Y wcw = -200 + 5(76) ± (2 365X25.15) 1+^4- 

= 190 i (2.365X29.51) 



WE TELL THE 
FOOTBALL 
COUM THAT 
WE’RE PRETTY 
SURE THE 
NEW GW 
WEI6-HS 
SOMEWHERE 
BETWEEN 1W 
ANP 250!!! 
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THE INTERVALS ARE PRETTy TERRIBLE? WHAT’S THE PROBLEM? THERE ARE 
TWO PROBLEMS, ACTUALLY 


HEIGHT ALONE IS NOT A VERy 600P 
PREDICTOR OF WEIGHT. 

“N 


"^THERE'S THE 
EFFECT Of 
^1 6&HOEP AND | 
6E|^T(CS .. 



nine data points weren’t enou&h. 

IN PARTICULAR, THERE WAS ONLy one 
STUPENT WITH HEIGHT 7A INCHES. 


kNY MOfcC 
Of 09 OUT 



THE PENN STATE STUDENTS 6lVE BETTER ESTIMATES. 


250 n 



60 65 70 75 


height 


rob 













- 

... *»*«-* 

- 

..... 

* 

—r r i—i—i—i—i—rn i 

HAS HO EFFECT OH 


hypothesis testing 


THE COMPLETE 5KEPTI0 Mf&HT 
SUREST THAT THERE 15 NO 
RELATIONSHIP BETWEEN HEIGHT 
ANP WEIGHT. THIS AMOUNTS TO 
SAyiN6 THAT 0 - 0 . 


WE TAKE THIS AS THE NULL 
HYPOTHESIS. 

» V /?=0 

IN THAT CASE, THE TEST STATISTIC 


" K(b) 

HA5 THE t PISTRIBUTION WITH 
n-2 PE6REES OF FREEPOM. 

AS USUAL, THE SIGNIFICANCE TEST 
PEPENPS ON THE ALTERNATE 
HyPOTHESIS. 

t > t a FOR H a O 
t < t a FOR H a : 0 <■ O 

Itl > It^l FOR M a -0^0 


FOR THE RIGGEP WEIGHT PATA, WE 
STRONGLy SUSPECT THE ALTERNATE 
HyPOTHESIS SHOULP BE 

H« ; ^ > O 

WE TEST 



FOR 7 PEGREES OF FREEPOM. 
t„ = 1095 S NCE t OM > t„. VIE 
REJECT THE NULL HyPOTHESIS AT THE 
a = 05 SIGNIFICANCE LEVEL ANP 
CONCLUPE THAT THERE IS A 
SIGNIFICANT. POSITIVE RELATIONSHIP 
BETWEEN HEIGHT ANP WEIGHT. 



2£>7 



Multiple linear 


regression 

wg can use tmg same BASie 
/peas to ANALyzg 
relationships between A 
pepenpent variable anp 

SEVERAL INPEPENPENT 
VARIABLE 


S poh't '/ex j -see 7 

/ »f*6 JU^T AW f^FlN£ 

/ n-l DIMEN4I0NM- HYPER 

1 Plane w in-- spalie' 

Vnoth»w 6 re? rr*^^ 

ttAjfTT 

Pou-rpo n ' 

, thAT l 


y = oc+p^+p 1 x x \-...p n % n \e 


FOR EXAMPLE, WEI6-HT 15 

peterminep ey a number 

OF FAZTORS OTHER TMAN 
hei&ht; A6E, sex, pigT, bopy 
T ypg, Ere. 





MATRIX ALGEBRA ANP A COMPUTER COMBING TO MAKE SUCH PROBLEMS EASY 
TO ANALYZE- 


Non-linear 

regression 



0.0 0.5 1.0 


SOMETIMES PATA OBVIOUSLy 
FIT A NON-LINEAR eURVE. 
STATISTICIANS HAVE A BAG OF 
TRICKS FOR VSINE LINEAR 
REGRESSION TEeHNIOUES FOR 
NON-LINEAR PROBLEMS. THE 
SIMPLEST OF THESE IS TO 
WRITE y AS A POLYNOMIAL 

y = a + fiyX + P t X 2 + 6. 

ANP TREAT X ANP X 1 AS 
INPEPENPENT VARIABLES IN A 
LINEAR MOPEL. 


x 


■zoe 





Regression diagnostics 

FITTING A COMPLEX MOPEL TO PATA GAN SOMETIMES OB5GURE MANY 
DIFFICULTIES. WE US E REGRESSION PIAGNOSTlG PROGEPURES TO UNCOVER ANY 
LURKING NASTY SURPRISES. 



THE SIMPLEST PROCEDURE IS TO PLOT THE RESIDUAL* e, AGAINST THE 
PREDICTOR Ifi- REMEMBER, THE ERROR e IS ASSUME? TO BE INDEPENDENT 
OF X 


A RANDOM SGATTERPLOT I UPDATE 5 
THAT THE MOPEL ASSUMPTIONS 
ARE PROBABLy OK. 


ANY PATTERN INDICATES A 
PEFlNITE PROBLEM WITH THE 
MOPEL ASSUMPTION*. 



A TYPICAL LURKING- 
NASTY SURPRISE CWHIGH 
EXISTS IN THE 
HEIGHT/WEIGHT DATA) 

IS THAT ERRORS ARE 
HETEROSCEDASTIE: I E, 
THE SPREAP OF e 
INGREASES AS U 
INGREASES. 


/take Two 
( ASPIRIN Amp 
RE-VlS-e 

/AOPEL 




IN THIS CHAPTER, WE 
HAVE SUMMARIZE? 

THE BASIC I PEAS ANP 
TECHNIQUES OF 
REGRESSION 
ANALySIS, THE STUPY 
OF STATISTICAL 
RELATIONSHIPS 
BETWEEN VARIABLES. 
THIS CONCLUPES OUR 
PETAILEP PlSCUSSlON 
OF BASIC STATISTICAL 
METHOPS- IN OUR 
FINAL CHAPTER, WE LL 
BRlEFLy REVIEW A 
FEW REMAINING 
TOPICS ANP ISSUES. 
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♦ Chapter 124 


CONCLUSION 


THE BASIC PRINCIPLES, TOOLS, ANP 
CALCULATIONS COVEREP IN THIS BOOK CAN 
BE EXTENPEP TO SOLVE MORE COMPLEX 
PROBLEMS HERE’S A 0/ASfP OF 

MORE APVANCEP STATISTICAL METHOPS? 


DATA DISPLAY 

WE *AW HOW TO PI*PLAY one VARIABLE with a pot plot anp two 
VARIABLE* U*!NG A *CATTERPLOT-BUT HOW PO WE GRAPHICALLY PI*PLAY 
MORE THAN TWO VARIABLE* ON A FLAT PAGE? AMONG THE MANY 
PO**IBlLmg*, A CARTOON GUlPE HA* TO MENTION HERMAN CHERNOFF'S 
*IMPLE IPEA. U*ING THE HUMAN FACE, A**IGN EACH FEATURE TO A VARIABLE 
ANP PRAW THE RE*ULTlNG CHERNoFF FACES: 




X = EYEBROW *LANT 
y. - EYE *IZE 
*-NO*E LENGTH 
t -MOUTH LENGTH 
— FACE HEIGHT 
ETC- 


Statistical analysis ot 

MULTIVARIATE DATA 


AN A**ORTMENT OF MULTIVARIATE MOPEL* HELP TO ANALYZE ANP PI*PLAY 
/7-PlMEN*lONAL PATA. *OME MULTIVARIATE TECHNIQUE*: 

Cluster analysis 

*EEK* TO PIVIPE THE 
POPULATION INTO 
HOMOGENEOU* *UBGROUP*. 

FOR EXAMPLE, BY ANALYZING 
CONGRE**IONAL VOTING 
PATTERN*. WE FlNP THAT 
REPRE*ENTATIVE* FROM THE 
SOUTH ANP WEST FORM TWO 
PI*TINCT CLU*TER* 



412. 




Discriminant analysis 

15 THE REVERSE PROVES*. FOR EXAMPLE, A £OLLE6E APMI55ION5 OFFICE Ml&HT 
LUCE TO FINP PATA &IVIN£ APVAN^E WARNING WHETHER AN APPLICANT WILL GO 
ON TO BE A *UCC£**FUL &RAPUATG (PONATE5 HEAVILy TO THE ALUMNI FUNPj 
OR AN UMfUCCC**FUL ONE C60E5 OUT TO PO &OOV IN THE WORLP ANP 15 
NEVER HEARP FROM A6-AIN;. 



Factor analysis 

5EEKT5 TO EXPLAIN HI6-H- 
PIMEN5IONAL PATA WITH A 
5MALLER NUMBER OF 
VARIABLE* A P5y£H0L06.l5T 
MAy 6-lVE A TE5T WITH WO 
OUE5TION5. WHILE 5ECRETLy 
A55UMIN6- THAT THE 
AN5WER5 PEPENP ON ONLy 
A FEW FACTOR*; 
EXTROVER5ION, 
AUTHORITARIANI5M, ALTRUI5M, 
£T£. THE TE5T RE5ULT5 
WOULP THEN BE 5UMMARIZEP 
U5lN6» ONLY A FEW 
£0MP05fTE 5£ORE5 IN 
TH05E PIMEN5ION5. 
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THERE IS ALSO MORE TO 

PROBABILITY: 

Random walks be&in with 

A COIN FLIP. SUPPOSE yOU MOVE AHEAP 
OKIE STEP FOR A HEAP AKJP BACK ONE STEP 
FOR A TAIL- CUSIN6. TWO COINS, yOU CAN 
PO THIS IN TWO PIMENSIONSJ REPEATEP 
FLIPS PROPUCE A STOCHASTIC PROCESS 
CALLEP A RAHPOM WALK. RAN POM WALK 
MOPELS ARE USEP IN *TO£K OPTION 
TRADING ANP PORTFOLIO MANA6CMCNT. 




Time series analysis peals with pata sets, which, 

LIKE THE RANPOM WALK, ACCUMULATC OVER VMS- LOCAL ANP GLOBAL 
TEMPERATURES, THE PRICE OF OIL, ETC. IN TlMC fCRICf AHALYflf. RANPOM 
MOPELS ARE USEP TO FORECAST FUTURE VALUES. 
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WEVE ALREAPY SEEN HOW THE COMPUTER HELPS WITH ANALYSIS AMP 
ARITHMETIC- THERE ARE ALSO SOME STATISTICAL IPEAS THAT OWE THEIR VERY 
EXISTENCE TO THE COMPUTER: 

Image analysis 

A COMPUTER IMA6E MI6HT CONSIST OF WOO BY WOO PIXELS. WITH EACH PATA 
POINT REPRESENTEP FROM A RAN6-E OF 16 7 MILLION COLORS AT ANY PIXEL 
STATISTICAL IMA6E ANALYSIS SEEKS TO EXTRACT MEANING FROM ’INFORMATION* 
LIKE THIS. 



Resampling 

SOMETIMES. STANPARP ERRORS ANP CONFIPENCE LIMITS ARE IMPOSSIBLE TO 
FINP. ENTER RESAMPLING. A TECHNIQUE THAT TREATS THE SAMPLE AS THOUGH 
IT WERE THE POPULATION. THESE TECHNIQUES GO BY SUCH NAMES AS 
RANDOMIZATION, JACKKNIFE. ANP BOOTSTRAPPING. 
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resampling (cont'd) 

TO PO RESAMPLING, THE COMPUTER 
‘‘RESAMPLES THE SAMPLE 

“computes the estimate 

FOR THE RESAMPLE 

“REPEATS THE FIRST TWO 
STEPS MANy TIMES, FINPING 
THE SPREAP OF THE 
RESAMPLEP ESTIMATES 



REMEMBER THE CORRELATION COEFFICIENT r OF THE 92 HEIGHT-WEIGHT PAIRS 
OF CHAPTER 11? WHAT'S THE fTAHPARP ERROR OF r? THE COMPUTER TAKES 
200 BOOTSTRAP SAMPLES FROM THE 92 PATA POINTS, COMPUTES r EACH TIME, 
ANP PLOTS A HISTOGRAM OF THE r VALUES. 



Bootstrapped Correlations 


NOTE THAT THE SPREAP OF THE BOOTSTRAP ESTIMATES IS RELATIVELy SMALL 






DATA QUALITY 


SEEMlNGLy SMALL ERRORS IN 
SAMPLING, MEASUREMENT, ANP PATA 
RECORPING CAN PLAy HAVOC WITH ANy 
ANALySIS. R. A. Fl*HER, GENETICIST 
ANP FOUNPER Of MOPERN STATISTICS. 
NOT ONty PESIGNEP ANP ANAUZEP 
ANIMAL BREEPING EXPERIMENTS. HE 
ALSO CLEANER THE CA6E5 ANP 
TEMPER THE ANIMAL*. BECAUSE HE 
KNEW THAT THE LOSS OF AN ANIMAL 
WOULP INFLUENCE HIS RESULTS 



1 cs\ 
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Innovation 


THE BEST SOLUTIONS ARE NOT ALWAyS IN THE BOOK? FOR EXAMPLE, A 
COMPANY HlREP TO ESTIMATE THE COMPOSITION OF A 6ARBA6E PUMP WAS 
FACEP WITH SOME INTERESTING PROBLEMS NOT FOUNP IN yOUR STANPARP 
TEXT... 


' HOVl VO You 6LT 
K GlMplE 

SAMPLE OP THIS-? 


WITHOUT' 

fWN&’y 



Communication 

BRILLIANT ANALySlS IS WORTHLESS UNLESS THE RESULTS ARE CLEARLY 
COMMUNICATEP IN PLAIN LANGUAGE, INCLUPING THE PEGREE OF STATISTICAL 
UNCERTAINTY IN THE CONCLUSIONS. FOR INSTANCE, THE MEPIA NOW MORE 
REGULARLY REPORT THE MARGIN OF ERRORS IN THEIR POLLING RESULTS. 


(Wj sr 

* v' 0 1 ^ ^ 

'S'*'* 


Teamwork 



IN OUR COMPLEX SOCIETY. THE SOLUTION TO MANY PROBLEMS REQUIRES A 
TEAM EFFORT. ENGINEERS, STATISTICIANS, ANP ASSEMBLY LINE WORKERS ARE 
COOPERATING TO IMPROVE THE QUALITY OF THEIR PROPUCTS. BIOSTATlSTlClANS 
POCTORS. ANP AIPS ACTIVISTS ARE NOW WORKING TOGETHER TO PESlGN 
CLINICAL TRIALS TO MORE RAPIPLY EVALUATE THERAPIES. 
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WELL, THAT* ITI B Y NOW, YOU 5WOULP BE ABLE TO PO 
ANVTHIN6. WfTM STATISTIC, EXCEPT A/£ ^E4f, 5TGAL, 
AMP 6AABLC. 



2.19 
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STATISTICAL SOFTWARE: ) 

IN THIS book; WE USEP THE MINITAP STATISTICAL SOFTWARE SYSTEM (MlNlTAB INC, 
STATE COLLEGE, PA/ THE PENN it ATE STUPENT HEIGHT ANP WEIGHT PATA IS FROM THE 
PULSE PATA SET ON THIS SYSTEM COMPUTER GRAPHICS WERE GENERATED By S-PLUS 
(■STATISTICAL SCIENCES INC. SEATTLE WA). ON A 40C PC CLONE S IS SOPHiSTICATEP SOFTWARE 
PEVELOPEP BY ATtT BELL LABS FOR ADVANCED ANALySlS ANP GRAPHICAL PISPLAYS 


RYAN, BARBARA. JOINER. BRIAN. ANP RYAN. THOMAS, 
MINITAB HANDBOOK. CPWS-KENT, BOSTON, 190S) ANP 
THE STUDENT EDITION OF MINITAB CADPlSON 
WESLEY) ARE FAST. INEXPENSIVE INTRODUCTIONS TO 
STATISTICAL COMPUTING MlNlTAB RUNS ON MAIN¬ 
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DATADESK CPATA PESCRlPTlON. ITHACA. Ny). FOR THE 
MACINTOSH 

SAS CSAS INSTITUTE INC, CARY. NO. SPSS CSPSS INC, 
CHICAGO, IL). ANP BMDP (BMPP STATISTICAL SOFTWARE, 
INC. LOS ANGELES, CA) WERE ORIGINALLY DESIGNED FOR 
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THESE PACKAGES PlFFER IN IMPORTANT DETAILS. YOU NEEP TO BE A SMART SHOPPER 
WE RECOMMENP CHOOSING A SYSTEM THAT YOUR COLLEAGUES HAVE ALREAPY TESTEP. 
FEW OF US ARE CUT OUT TO BE STATISTICAL SOFTWARE PIONEERS WHEN LEARNING A 
NEW SYSTEM. EXPERIMENT WITH SMALL, FAMILIAR PATA SETS REMEMBER. THE MOST 
EXPENSIVE PART OF ANY SOFTWARE IS YOUR TIME THE CARTOON RULE FOR 
LEARNING STATISTICAL COMPUTING IS: FAMILIARITY BREEPS RESULTS 


TRYING TO LEARN STATISTICAL THEORY ANP 
STATISTICAL COMPUTING AT THE SAME TIME IS 
A LITTLE LUCE TRYING TO WALK ANP CHEW 
GUM AT THE SAME TIME. DIFFERENT SKILLS ANP 
THOUGHT PROCESSES ARE INVOLVEP IN EACH 
SET ASIPE SEPARATE TIMES TO LEARN THESE 
SUBJECTS. THEN BRING THEM TOGETHER IN 
THIS WAY. YOU CAN BECOME A CHEWING, 
WALKING, COMPUTING. RENAISSANCE 
STATISTICIAN.' 
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Acceptance sampling, 150 

Addition rule for events, 38-39, 42, 44 

Alternate hypothesis (H.), 140-141. 147 149. 

152 153. 165—166. See also Hypothesis 
testing 

lefthander!. 144 145 
relevant. 144-145 
right-handed. 144-145 
two-handed, 144—145 
Analysis of variance. See ANOVA 
A NOV A (analysis of variance), 186. 193-195. 
table, 194 

Approximate probability, 60 
A pproxintafion 

binomial. 79-81,86-88 
continuous. 87 88 
normal, 87-88 

Archery lessons, conlidcnce intervals and, 116-124 
Area under the curve. 64-66 
Arrays, 14 15 

Aspirin clinical trials. 160-167. See also Two 
populations compared 
Astralagi, 28 

Average salary comparison, 168-169. 

See also Two populations compared 
Average squared distance. 22 
Avetage value, 15-17 

standard deviations from, 22,24-25. 168. 171 

Bar graphs. I 1 
Bayes. Joe. 46-50 
Bayes, Rev. Thomas, 46-50 
Bayesian, 35 
Bayes Theorem, 46 50 
Bernoulli. James, 79 
Bernoulli trial, 74—75. 78 
sampling size and, 98-100 
Best linear unbiased estimators (BLUR), in 
regression analysis. 201-202 
Beta (probability of type II error). 151-155 
Bias 

in polls. 126-127 

reducing natural, w ith paired comparison, 178 
m simple random sampling, steps to eliminate. 
167 


Binomial approximation, 79 81.86-88 
Binomial coefficient. 76 
multiplication rule and. 76 
Pascal’s triangle and, 77 
Binomial distribution, 77, 81,83. 86, 88 
asymmetrical. 82 

calculating, for large values, 79-80 
continuous density function and. 79-80 
mean of. 78 

standard normals and, 82 
variance of. 78 

Binomial distribution table. 78 
Binomial probability distribution, 77-78 
Binomial random variables. 74 76. 139-140 
Blocks 

complete randomized. 184-185 
in experimental design. 183-184 
BLUR (best linear unbiased estimators), in 
regression analysis, 201-202 
Bootstrapping. 215-216 
Box and whiskers plot, 21 
Brass tacks. 98 103 

Categorical statements. 2 
Central limit theorem. 106. 128. 169 
fuzzy. 83—88 
problems with. 107 
Central value, 14. See also Spread 
mean, 15—16 
median. 17-18 
Challenger (space shuttle). 3 
Chameleon Motors 

comparing small sample means, 170-171 
confidence intervals. 134- 135 
hypothesis testing for. 149-150 
Chernoff, Herman. 212 
Classical probability. 35 
Claudius I, 28 
Cluster analysis, 212 
Cluster sampling tlestgn. 95 
Coin toss. 32. 54-55, 58. 60-62. 68 70 
Communication, 218 
Comparing failure rates, 160 163 
Comparing small sample means. 170-171 
Comparing success Mies. 160-163 





Comparing iwo populations. See Two 
populations compared 
Compansonof average salaries, 168-169 
Comparisons, paired, 174-178 
Complete randomized block, 184-185 
Computer image analysis, 215 
Computer resampling, 215-216 
Conditional probability. 40—11 
false positive paradox and, 46-50 
multiplication rule and, 42-44 
Confidence interval levels 
decision theory and. 152-153 
measuring, 122-123 
Confidence intervals, 112-136 

computer simulation of, for samples, 120 
error levels and. 124-127 
estimating, 114-127 
increasing levels of, 121-125 
margin of error and. 119. 121 
in paired comparisons. 176 
population means and. 128-130. 169 
population proportion and, 128-130 
probability calculation aiul, 117-119 
random sampling used for. 114-115. 119 
in regression arm lysis, 203-206 
sample means and, 130. 171 
standard deviation in. 117, 128-130 
standard error in, 118, 128-130 
Student’s f based, 131-136 
for success rates. 164 
table for levels. 122-123 
Continuity correction, 87-88 
Continuous densities, properties of, 66-67 
Continuous density function, binomial distribu- 
tion and, 79-80 
Continuous probabilities. 64 
Continuous random v ariables, 63 
mean of, 67 

probability density of. 65 
variance of, 67 

Correlation, squared, in regression analysis, 195 
Correlation coefficient, in regression analysis, 
196 

Cumulative probability, 84 
Curve, area under the, 64-66 

Data 

multivariate, statistical analysis of, 212-213 
order of. 17 

paired and unpaired compared, 177-178 
properties of. 59 

rigged, in regression analysis, 189. 192. 

194-195, 205 207 

spread of. in regression analysis, 190-195 
Data analysis. 4 
Data description. 8- 26 
Data display, 212 


Data points, 11-12. 14-15 
average, 17 
middle, 17 
Data quality. 217 
Data summary, 12 
Death rate, 13 

Decision table, two-by-two, 152 
Decision theory, hypothesis testing, 151-155 
Deductive reasoning, 113 
Degrees of freedom, 131-135 

in comparing small sample means, 171 
hypothesis testing and, 149-150 
de Merc, Chevalier, 28-29. 75, 78 
de Moivre, Abraham, 79-83, 86-88. 101 
Dependent random variable, in regression 
analyses, 199-209 

Dependent variable, in regression analysis. 189 
Dice. 28-45 
loaded, 33 

Discrete probabilities. 64, 66 
Discrete random variables, 63 
Discriminate analysis, 213 
Dot plots, 9 
two-dimensional. 188 

Election polls, 114 127 

hypothesis testing in, 143-145 
FJememary outcomes, 30. 32-38 
Error levels, confidence intervals anil, 124-127 
Errors 

hetcrosceilasiic, 209 

margin of, confidence intervals and, 119, 121 
measurement, experimental design and, 183 
random error fluctuations, 199-209 
standard. See Standard error (Sli) 
sum of squared (SSF.). in regression analysis, 
190-195 
type 1. 151-154 
type II. 151-154 
Estimates, 102-103, 107 
Estimating confidence intervals. I 14-127 
Estimators, 102-103 

best linear unbiased (BLUE), in regression 
analysis. 201-202 

in comparing Ihe means of two populations. 
168-169 
Events 

addition rule for, 38-39. 42, 44 
mutually exclusive. 39. 42, 44 
probability of. 35-37 
repeatable, 35 

rules tor outcomes of. 38-39 
subtraction rule for, 39, 44 
Expected value. 61 
Experiment 

random, 30, 32, 34. 36 

sampling and. 98-100, 104-105 
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bxpenment (connmietll 
weight. 9-12. 16. 18-26 

regression. 188-209. See also Regression 
Lx peri mental design 
basic principles. 183 
blocks in, 183-184 
elements of, 182-183 
four-by-four cable in. 184 185 
Latin square in. 184-185 
local control in, 183 
measurement error and, 183 
natural variability and. 183 185 
randomization in, 183, 185 
replication in, 183. 185 
total variability and, 186 
bxpcriinemal treatments, 182-183 
Lxperimental units, 182-183 

Factor analysis, 213 

Failure rales, comparing, for two populations. 
160-163 

False positive paradox. 46-50 
Fertnut. Pierre dc, 28—15 
Fisher. R. A.. 217 

Fitting process, in regression analysis. 189-196 
Fixed significance level, in hypothesis testing. 
141-142. 145 

Four-by-four table, in experimental design, 
184-185 

Frequency, relative. 10-11, 35, 57 58, 60 
Frequency histograms. 11. 57-58 
Frequency tables, intervals in, 10—I I 

Gallup Poll. 127 
Gambling. 27—45 
Gasoline comparisons, 172-173 
experiment design and, 182-186 
paired comparisons of. 174-178 
Cosset. William. 108-109. 131-132 
Graphic display. 13 
Graphs 
bar, 11 

histograms. See Histograms 
probability distribution. 56-58 

(H a V See Alternate hypothesis 
Hecerosccdastic errors. 209 
Histograms, 13 
frequency. 57 
probability, 56-58 
relative frequency. 11, 57-58 
spread measured in. 19 
symineirical. 24-25. 77 
Hite. Sherc, 97 
Holmes. Sherlock. 113-130 
H () (nu!l hypotheses), 140-141. 144-145, 

147-150. 152-153. 165-166. See also 


Hypothesis testing 

Hypotheses. See also Hypothesis testing 

alternate (H a ). 140-141. 147-149. 152-153. 
165-166 

left-handed. 144-145 
relevant. 144-145 
right-handed. 144-145 
two-handed, 144 145 
null <U 0 ). 140-141. 144 145. 147-150. 
152-153.165-166 
Hypothesis testing. 138-139 
decision theory, 151-155 
degrees of freedom and. 149-150 
fixed significance level in, 14 1-142, 145 
large sample 

for population mean. 146-148 
significance test for proportions, 143-145 
in paired comparisons, 176 
popuklion mean and. 146-148. 169 
probability statement in. 141 142 
in regression analysis. 207 
statistical. 140-142 

Iguana autos. 170-171 
Increments, 9 
Independence. 71,74 

simple random sampling and. 92 94. 96 
special multiplication rule and. 43-44 
Independent mechanisms. 71 
Independent variable, in regression analysis. 

189, 199-209 
Inductive reasoning. 113 
Innovation, 218 

Inspection sampling, significance lest used in, 
146-148 
Integral, b6-67 

Interquartile range (IQR). spread measured in. 
20-21 
Intervals 

confiilence. See Confidence intervals in a 
frequency table. 10—11 

IQR (interquartile range), spread measured in. 
20-21 

Jackknife. 215-216 

Jury selection, racial bias in. 138-141 

Large sample hypothesis testing 
fi>r population mean. 146-148 
significance lesi for proportions, 143-145 
Large values, calculating binomial distribution 
for, 79 80 

Latin square, in experimental design. 184-185 
Least squares line, 189-190. 208 
Left-handed alternate hypothesis. 144-145 
Linear regression, in regression analysis. 
189-190. 208 
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Local control, in experimental design, 183 
Logical operations, 37 

Mai gin of error, confidence intervals and. M9,121 
Mean, 15-16, 18 

of binomial distribution, 78 
central, 15-16 

comparing small sample, 170-171 
confidence intervals and. 128-130, 169. 171 
large sample test for, 146-148 
in paired comparisons, 175-176 
population, 59. 62. 80 

confidence intervals and, 128-130. 169 
hypothesis testing and, 146-148 
of probability distribution, 60-61 
of random variables, 61, 67-69 
sample 

comparing small. 170-171 
confidence intervals and. 130. 171 
distribution of, 104-106. 171 
hypothesis testing for, 146-148 
standard deviation from, 22, 24-25, 62, 168, 
171 

Mean response, predicting, in regression 
analysis, 204-206 

Measurement error, experimental design and, 183 

Measures of spread, 19-25 

Median, 17-18, 20-21 

Midpoints, 10-11 

Model properties. 59 

Models 

regression, 199-202 
stochastic random, 116-118 
for two populations, 162 
Monitoring programs 

power analysis in. 154-155 
probability of type 11 errors in, 151-155 
Mortality statistics. 13 
Multiple linear regression, in regression 
analysis. 208 
Multiplication rule. 45 

binomial coefficient and, 76 
conditional probability and, 42—44 
Multivariate data, statistical analysis of 
cluster. 212 
discriminate. 213 
factor, 213 

mu. See Population mean 
Mutually exclusive events. 39,42,44 

Natural bias, reducing, with paired comparison. 
178 

Natural variability 

experimental design and. 183-185 
reducing, with paired comparison, 178 
Nightingale, Florence, 13 
Non-linear regression, in regression analysis. 208 


Normal approximation, 87-88 
Normal distribution, standard. 79-85 
rule for computing, 85 
table to find. 84-85 

Null hypothesis (H 0 X 140-141, 144-145, 

147-150, 152-153, 165-166. See also 
Hypothesis testing 

Numerical outcome, sampling and. 98-100, 
104-105 

Numerical weight. 32 
ObjectiviM, 35 

Observed value of /, 149-150 

Observed value of;, hypothesis testing and. 

144-145. 165-166. 169 
Opportunity sampling. 97 
Opportunity sampling design. 97 
Order of data, 17 
Outcomes 

elementary, 30. 32-38, 41 
of events, rules for. 38-39 
numerical, sampling and, 98-100, 104-105 
Outliers, 18, 21-23 

Paired comparisons 
of gasolines, 174-178 
means in. 175-176 

paired and unpaired data compared. 177 178 
small-sample i test statistic for, 176 
standard deviation in. 175-176 
Pascal. Blaise, 29 
Pascal’s triangle. 77 
Personal probability, 35 
Polls 

bias in. 126-127 
election. 114-127 
error levels in, 124-127 
Gallup. 127 

hypothesis testing in, 143-145 
as opposed to actual elections, 126-127 
Pollution monitoring, probability of type II 
cnors in. 151-155 
Pool the sum of squares 

in comparing small sample means. 171 
Population. See at so Two populations compared 
properties, 59 
proportion. 128-130 
standard deviation, 59. 62. 80 
Population mean. 59, 62, 80. See also Two 
populations compared 
confidence intervals and, 128-130. 169 
hypothesis testing and, 146-148 
Power analysis in monitoring programs. 154-155 
Prediction line, 189 

Predictor variable, in regression analysis. 189 
Probabilities. 4, 27—51 
approximate, 60 
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Probabilities (continued) 

characteristic properties of. 34 

classical, 35 

conditional 

false positive paradox and. 46-50 
multiplication rule and. 42-44 
continuous. 64 
cumulative. 84 
discrete, 64. 66 

formulas for manipulating, 37-39 
non-negative, 34 
normal, 83 85 
personal. 35 

repeatable events and. 35 
sample. 100 
spread of, 67 

Probability calculation, confidence intervals 
and. 117-119 
Probability density, 66 

of continuous random vanuble, 65 
Probability distribution 
binomial, 77-78 
graphs, 56-58 
mean of, 60-61 
properties of, 59 
random variable, 55-58 
table to find normal, 84-85 
Probability graphs, 56-58 
Probability of type II errors, 151-155 
Probability statement, in hypothesis testing. 
141-142 

Probability zero. 63-64 

Propoition of successes. See Success rates 

Pseudo-random numbers, 65 

P-valuc. in hypothesis testing. 141-142. 148 

Random error OuctiMlions. in regression 
analysts. 199-209 
Random experiment. 30. 32, 34. 36 
sampling and. 98-100. 104-105 
Randomisation. 215—216 

in experimental design. 183, 185 
Random models, stochastic, 116-1 18 
Random number generator, 65. 94 
Random sampling 

independence and, 92 94. 96 
simple. 92—96, 167 
steps to eliminate bias in, 167 
used for confidence intervals, I 14—115, 119 
Random sampling design, 92 94 
Random selection of jurors, 138-141 
Random variables. 53-72 
adding, 68-71 
binomial. 74-76, 139-140 
discrete, 63 
mean of, 61.67-69 
piobability distribution, 55-58 


sampling and. 98-100. 104—105 
r. 107-109 

variance of, 62. 67-71 
Random variable r, 107-109 
Random walk, 214 
Regression, 187-209 
Regression analysis 

best linear unbiased estimators (BLUE) in. 

201-202 

confidence intervals in, 203-206 
correlation coefficient in, 196 
dependent random variable in. 199-209 
dependent variable in. 189 
fitting process in, 189 196 
hypothesis testing in. 207 
independent variahle in, 189. l9y-209 
linear regression in, 189-190. 208 
predicting mean response in, 204—206 
predictor variable in. 189 
random error fluctuations in, 199-209 
regression diagnostics in. 209 
response variable in, 189 
rigged data in. 189, 192, 194 195. 205-207 
spread of data in, ! 90-195 
squared correlation in, 195 
standard ermr (SE) without derivation in, 203 
statistical inference in. 199-209 
student weight experiment and. 188-209 
sum of squared errors (SSE) in, 190-195 
sum of squared regression (SSR) in. 194-196 
Regression coefficient sample. 191-192 
Regression line. 189-190. 208 
Regression model, 199-202 
Relative frequency. 10. 35, 60 
Relative frequency histogiams, 11.57-58 
Repeatable events, probability and, 35 
Replication in experimental design, 183, 185 
Resampling. 215-216 

Response variable in regression analysis, 189 
Right-handed alternate hypothesis, 144-145 
Rounding off, 9 
Round numbers. 10 

Salk polio vaccine. 3 
Sample means 

comparing small. 170-171 
confidence intervals and, 130, 171 
distribution of. 104-106 
hypothesis testing for the population mean. 

146-148 

Sample probability. 100 
Sample properties, 59 
Sample regression coefficient. 191-192 
Sample sue, 91 

comparing small. 170-171 
confidence levels and. 124 125 
increasing, 124-125 
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:>ainpie size (continued) 
standard error and, 98-105 
testing targe, 143-148 
Sample space, 30-31,33,41 
Sample variance, 22 
Sampling, 89-109 
acceptance, 150 
random, 95 

independence and, 92-94, 96 
steps to eliminate bias in. 167 
used for confidence intervals, 114-115, 119 
random experiment anil, 98-100, 104-105 
random variables and, 98-100, 104-105 
staiulard deviation and, 101-103 
Sampling design 
cluster, 95 
opportunity, 97 
random, 92-94 
simple random, 92-94 
stratified, 95 
systematic, 96-97 
Sampling distribution 
of the mean, 104-106 
for proportion of successes. 163 
Scatierplois, 188-189 
random, 209 

SD. See Standard deviation 

SE. See Standard error 
Senator Astute. See Election polls 
Sigma, 16. See also Summary statistics 
Significance level 

fixed. 141-142, 145 

in hypothesis testing, 141-142, 145. 147-148 
in scientific work, 141-142 
Significance test 

for proportions, 143-145 
used in inspection sampling, 146-148 
Simple random sampling. 92 -96, 167. See also 
Random sampling 

Smoke-detectors, as a decision theory example. 
151-154 

Special addition rule, for mutually exclusive 
events, 39, 42, 44 
Special multiplication rule 

conditional probability and, 42-44 
independence and, 43 -44 
Spinning pointer, 63-64 
Spread. 14 

of data in regression analysis. 190-192 
sunt of squared errors relative to, 193-195 
measures of, 19-25 
of probabilities, 67 
variance in, 22-23 
Spread distance, squares of, 22 
Squared correlation, in regression analysis, 195 
Squared distance, 22, 61-62 
Squared errors, sum of (SS1-.) in regression 


analysis. 190-195 

Squared regression, sum of (SSR). in regression 
analysis, 194-196 

Square root, standard deviation ilefined by. 23 
Squares, pool the sunt of, 171 
SSE (sum of squared errors) 
in regression analysts, 190-195 
relative to spread of data. 193-195 
SSR (sum of squared regression), in regression 
ana lysis, 194-196 
Stundard ileviation (SD) 

in comparing small sample means, 171 
in comparing the means of two populations. 
168 

in confidence intervals, 117, 128-130 
defined by square root, 23 
from mean values, 22. 24-25. 168, 171 
in paired comparisons, 175-176 
population. 59. 62. 80 
sampling and, 101 103, 107 
spread measures and, 22 
2 -scores arul. 24-25 
Standard error (SE) 

in comparing small sample means, 171 
in comparing the means of two populations, 
168 

Standard error (SE) 

in confidence intervals, 1)8. 128 130 
sample size and. 98-103 
without derivation, in regression analysis, 203 
Standard normal distribution, 79-82 
table for. 84-85 

Statistical analysis of multivariate data, 
212-213 

Statistical hypothesis testing, 140-142, 144-145. 

147-148, 165-166, 169 
Statistical inference, 4 

in regression analysis, 199-209 
Statistical situations. 158—159 
Statistics 

mortality, 13 
summary, 14-26, 148 
Stem-and-Ieaf diagram, 12, 18 
Stochastic random models, 116-118 
Stratified sampling design. 95 
Student's t. See t-distcibution 
Subjectivist. 35 

Subtraction rule for events, 39. 44 
Successes, number of. 75 
Success rates, 99 

comparing, for two populations, 160-163 
confiilence intervals for. 164 
in hypothesis testing, 143-145 
sampling distribution for. 163 
Summary statistics, 14—26 
in hypothesis testing, 148 
Summation. 16. Set- also Summary statistics 
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in regression analysis, 190-195 
relative to spread of data, 193-195 
Sum of squared regression (SSR), in regression 
analysis. 194-196 
Sum of squares, pool the, 171 
Systematic sampling design, 96-97 

/-distribution, 107-109 

in comparing small sample means, 171 
confitlence intervals based on, 131-136 
critical values for, 132-136, 150 
hypothesis testing and. 149-150 
Teamwork, 218 
Test statistic 

in hypothesis testing, 140-141. 144-145, 
147-148, 165-166, 169 
small-sample r, for paired comparisons, 176 
Time series analysis, 214-215 
/-observed value, 149-150 
Total variability 

due to the regression, 194 195 
experimental design and. 186 
Tukey, John, 12, 21 
/-values. See /-distribution 
Two-by-two decision table, 152 
Two-handed alternate hypothesis, 144—145 
Two populations compared, 158-179- See also 
Population 

confidence intervals for, 164. 169 
hypothesis testing, 160-163, 169 
mean of, 168-169 
model for, 162 

sampling distribution for proportion of 
successes, 163 
success rates. 160-164 
Type I errors, 151-154 
Type II errors. 151-155 
Typical value, 14 18. See also Spread 


Variability 

natural 

experimental design and, 183-185 
reducing, with paired comparison, 178 
total 

due to the regression. 194-195 
experimental design and, 186 
Variables 

binomial random. 74—76, 139-140 
continuous random. 63. 65. 67 
dependent, 189 
dependent random, 199-209 
discrete random. 63 
random. See Random variables 
in regression analysis, 189, 199-209 
Variance 

analysis of. See ANOVA 
of binomial distribution, 78 
of continuous random variables. 67 
of random variables. 62. 67-71 
sample, 22 
in spread, 22-23 
Vertical scale, 11 

Weight experiment. Penn Slate student, 9-12, 
16. 18-26, 188-209. See also Regres¬ 
sion: Regression analysis 

x-axis, 80 

y-axis, 80 

Z-observed value, hypothesis testing and. 

144-145. 165-166, 169 
z-scores, standard deviation and, 24-25 
z transformation, 84-88. 117-118 
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